Course: EPIB 679-001 Clinical Epidemiology Date: May 9 to June 3 8:35 – 11:40 Session 5: Cohort studies Learning objectives • To understand the concepts of the different study designs • To learn the advantages and disadvantages of the different study designs Dr. J. Brophy Epidemiology Definition Basic Question of Epidemiology • “The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems” (Last, 2004) Epidemiology Epidemiology is: (1) health and medically oriented (2) based on POPULATIONS (3) statistical in nature (quantitative) (4) multi-disciplinary Epidemiology • Epidemiologists are: – Scientists – Population Doctors – Match-makers (bring together various disciplines to answer health-related questions) 1 Choice of study design Validity & Reproducibility Biased but reliable Definition: Reliability Accuracy • The degree of stability exhibited when a measurement is repeated under identical conditions. • Synonyms: Repeatability, reproducibility • Antonym: Uncertainty Valid and reliable Valid but unreliable OVERVIEW OF STUDY DESIGNS Study designs - RCTs POPULATION LEVEL • Ecologic or Correlational studies DESCRIPTIVE • Randomized controlled trials (RCTs) as the “gold standard” INDIVIDUAL LEVEL GOAL • Hypothesis generating • Resource allocation • Educational needs • Drug utilization studies • Case reports / series • Cross-sectional surveys OBSERVATIONAL TYPES OF STUDIES Cohort studies • Prospective vs retrospective • Field vs database studies Nested case-control studies – Short follow-up Case-cohort studies – Highly selected populations Case-control studies • Prospective vs retrospective • Field vs database studies GOAL Case-crossover studies • Hypothesis testing • Provide evidence to establish causality • Prospective vs Retrospective • Field vs database studies EXPERIMENTAL / INTERVENTIONAL – Small sample size • Retrospective • Field vs database studies • Retrospective • Field vs database studies ANALYTICAL • Limitations of RCTs – Highly controlled conditions • Justification for RCT = clinical equipoise Randomized controlled trials • Prospective • Field 2 OVERVIEW OF STUDY DESIGNS Randomized controlled trials POPULATION LEVEL • Ecologic or Correlational studies DESCRIPTIVE INDIVIDUAL LEVEL • Strongest level of evidence … Caution! Not necessarily bias free • Distinguishing feature ; exposure is randomly assigned by investigator GOAL • Hypothesis generating • Resource allocation • Educational needs • Drug utilization studies • Case reports / series • Cross-sectional surveys OBSERVATIONAL TYPES OF STUDIES – Comparing effect of drug only … Caution! • Retrospective • Field vs database studies Case-cohort studies • Retrospective • Field vs database studies ANALYTICAL Case-control studies • Prospective vs retrospective • Field vs database studies GOAL Case-crossover studies • Hypothesis testing • Provide evidence to establish causality • Prospective vs Retrospective • Field vs database studies EXPERIMENTAL / INTERVENTIONAL Descriptive studies • Prospective vs retrospective • Field vs database studies Nested case-control studies • Major strength – Randomization ; all differences between groups are balanced across treatment arms Cohort studies Randomized controlled trials • Prospective • Field Descriptive study designs • Use of drugs & distribution of outcomes – Person – Time – Place • Goal – Resource allocation & education – Hypothesis generating Descriptive studies • Types of descriptive studies – Population level information – Individual level information • Strengths Analytical studies • Types of analytical studies – Interventional / Experimental – Observational – Relatively quick, easy & inexpensive • Cohort studies – Provide justification for more expensive study • Nested case-control studies • Limitations – Cannot establish causality Strongest • Randomized controlled trials Weakest • Case-control studies • Case-crossover • Major difference 3 Observational studies Definition: Cohort • Distinguishing feature ; investigator does not control the exposure • From the Latin cohors – warriors, the tenth part of a legion. Any group of persons (usually sharing some common characteristic) who are followed-up or traced over a period of time. • Types of observational studies – Cohort studies – Case-control studies • Classification – Prospective vs retrospective study – Field vs database study Schematic of a Cohort Study Populations and General Design Overarching population (universe) that we would like to make inferences Select Ascertain Exposure Target population to which inferences are drawn Source population Source of persons without outcome (sampling frame) Nonparticipants Participants Outcome Past Present Exposed (dynamic) cohort Non-exposed (dynamic) or reference population Assessment of Outcome Assessment of Outcome Time Cohort studies Comparison • Group that shares a common experience • Subjects classified on the basis of exposure status • Longitudinal studies ; followed for a specified period of time until events occur • Distinguishing feature ; compare rates of events/outcomes by exposure group 4 Cohort studies • If rate of event among exposed > rate of event in unexposed = harmful drug • If rate of event among exposed < rate of event in exposed = protective drug Cohort design Cohort studies • Strengths – Can study rare exposures – Can study multiple outcomes – Temporality is assured ; causality criteria – Unbiased selection of comparator group – Retrospective studies are relatively quick and inexpensive … caution re: bias Cohort studies Potential problems • Limitations – Inefficient for rare events/diseases or outcomes with long induction periods – If prospective ; expensive and time consuming • Sources of bias – Non-participation (selection in) – Losses to follow-up (selection out) – Recall / interviewer bias if retrospective 5 Definition: Bias • Bias: Deviation of results or inferences from the truth, or processes leading to such deviation. 1. Systematic variations of measurements from their true values (systematic error; antonym, validity) 2. Variations of statistics from their true values as a result of systematic variation of measurements, other flaws in data collection, or flaws in study design and analysis. Biases • Misclassifcation • Selection (chanelling) • Losses to follow-up (correlated to exposure and disease) • Effect of non-participation Antonym: Validity Selection bias Confounding bias Intervention Outcome Confounder • • • • • • • Channeling Effect (or Channeling Bias): Age Sex Stage of disease Previous treatments Genetics Behaviour Others Effect modification • The tendency of clinicians to prescribe treatment based on a patient’s prognosis. As a result of the behavior, comparisons between treated and untreated patients will yield a biased estimate of treatment effect. 6 Key Questions Comparisons Comparisons Key Questions All patients Restricted cohort 7 Key Questions Strengths of cohort studies • Useful if exposure is rare • Can examine multiple effects of a single exposure • Can elucidate temporal relationship • If prospective, minimizes ascertainment bias • Allows direct measurement of disease incidence in both exposed and non-exposed groups Limitations of cohort studies • Inefficient for rare diseases • Can be expensive and time consuming if prospective • If retrospective, need reliable records • Validity affected by losses to follow-up Miasmata Theory Cholera in London in the mid-1800s: John Snow and the Beginnings of Epidemiology • Thought that cholera was brought to Europe from India • Prevailing theory in the 1880s: airborne poison arising from unhealthy and unsanitary conditions (“miasmata”) – Miasma: noxious exhalations from putrescent organic matter; poisonous effluvia or germs infecting the environment 8 Hypothesis Snow’s Experimentum Crucis Water Supply from Polluted Thames River: Southwark & Vauxhall Co. Lambeth Co. • Higher rates in the south because water companies drew water from the polluted Thames River Natural experiment: In 1852, Lambeth changed its source to a less polluted part of The Thames 1849 Epidemiology • 1854 Epidemiology Unit of observation is mixed: 1. Numerator - the individual: fact, date, cause of death, and water company Water company obtained from detailed inquiry or test of water for concentrations of NaCl • Exposures to the “causal agent”: inferred to be related to the water supply – Thus, the company that supplied the water is a surrogate variable • Use of “company” is referred to as an “ecological” variable – Every individual and home so classified is assumed to have the same exposure (homogeneity of exposure) 2. Denominator – the number of homes (not individuals) served by each company • Relatively low rates of cholera in London 1854 epidemic: Snow determined no. of homes served by each company Collected death reports and classified deaths by water company Calculated ratios of deaths to no. of homes, by water company • This study would probably now be referred to as an “ecological study” Statistic: Ratio=Numerator/Denominator (unit: persons/homes) – not a proportion (unitless) Deaths from Cholera per 10,000 Homes, by Source of Water Supply, London, 1854 Company Number of homes served Deaths from cholera Deaths /10,000 homes Ratio Difference Rest of London 256,423 1,422 55.5 1 0 Lambeth 26,107 98 37.5 0.7 -18.0 Southwark & Vauxhall 40,046 1,263 315.4 5.7 +259.9 Estimate of average no. of deaths per home Broad Street Pump Episode • Another detailed cluster investigation by Snow • Occurred at the end of August 1854 • Attributed source: polluted well water contaminated from an adjoining cesspool which was contaminated with water from a young girl who apparently had the cholera ratio=8.4 9 Pump Handle Removed Contingency Table: Mortality from Cholera in the Broad Street, Aug. 31-Sept. 2 (Whitehead’s observations: Shephard, p. 224) Broad Street Pump - Number of Deaths in 1854 Pump handle removed (Sept 8) 140 Acknowledged to have: No. of deaths 120 Drank water Did not drink Total water 100 80 Cholera 80 20 100 No cholera 57 279 336 Total 137 299 436 Total deaths=573 60 40 20 0 Date (from Aug 31, 1854) Relative Risk as a Measure of Association Risk of dying from cholera: drank water: 80 ÷ 137=0.584 did not drink water: 20 ÷ 299 = 0.067 Relative risk (RR) = 0.584 ÷ 0.067 = 8.72 RRs and ORs • The OR>>RR because the disease is not rare – i.e., the risk of dying is 100/436=23% • For small risks (∼≤0.05), OR~RR Odds Ratio as a Measure of Association Odds of dying from cholera: drank water: 80 ÷ 57=1.40 did not drink water: 20 ÷ 279 = 0.072 Odds ratio (OR) = 1.40 ÷ 0.072 = 19.6 Classic cohort studies • British Doctors Cohort • Framingham • Harvard Nurses’ Health Study 10 British Doctors Cohort Follow-up for mortality Design More quests Example: British Doctors Cohort Study 1951 1957 1966 1972 time Questionnaires on smoking habits to 59,600 male & female physicians - 34,440 responded 1st quest. Response~69% British Doctors Cohort British Doctors Cohort • Overarching population (universe): entire population • Target population: Men and women, age >20, in 1951 • Source population: British MDs, age >20, in 1951 • Exposure: Smoking information from subjects based on a short postal questionnaire – Current smokers • Age started smoking • Amount consumed currently • Method of smoking – Past smokers • Same as above • Date stopped smoking – Sampling frame: Medical register of MDs – Never smoked regularly (<1 cigarette/year for one year) British Doctors Cohort • Outcome: – Mortality ascertained by looking-up death certificates – Cause of death is filled in by a physician or the coroner • Analysis: – Compare rates of death according to level of selfreported smoking Typical Questions about Smoking • • • • Type of smoking (cigarettes, cigars, pipes) Have you ever smoked regularly? How old were you when you started to smoke? How many cigarettes per day do you smoke now? • If you stopped completely, how long ago was this? 11 Metrics of Exposure Metrics of Exposure to Tobacco Smoke • The following indices can be estimated: – – – – – – Type of smoking (cigarettes, cigars, pipes) Duration (time since starting) Time since quiting Average Intensity (e.g., no. of cigarettes/day) Frequency (e.g., percent time smoked in a week) Current smoking status • Cumulative exposure: frequency of smoking x intensity x duration – E.g., 1 pack per day x 20 cigarettes/pack x 365 days/year x 30 years= 219,000 cigarette-days=30 pack-years • Lagged cumulative exposure (e.g., excluding last 10 years of smoking) Definitions: Exposure and Dose British Doctors • Exposure: The presence of a substance in the environment external to the subject (external/environmental) • Amount smoked at time of administration of first questionnaire: • Dose: The amount of a substance that reaches susceptible targets in the body (internal) • These groups represent sub-cohorts defined by exposure at time of entry into the study • However, information obtained during follow-up can change exposure status, so these sub-cohorts would not be fixed British Doctors Cohort: Men British Doctors Study: Lung Cancer in Men among Current Smokers from Data Obtained at Last Questionnaire Survey period 1st Quest 2nd Quest 3rd Quest 4th Quest Known to have died N/A 3122 7301 10634 Presumably alive N/A 31318 27139 23806 Replied 40,637 (69%) 30,810 (98.4%) 26,163 (96.4%) 23,299 (97.9%) Reasons for nonresponse 18,963 508 1156 507 Too ill NA 31 65 21 Refused NA 36 63 102 Not found NA 72 403 22 Other NA 369 445 362 Non-smokers Current: 1-14 cigs/day 15-24 ≥ 25 Age-standardized death rate (10-5) Mortality Rate Ratio Non smokers 10 1 Cigarettes only 140 14 (=140/10) Pipe &/or cigars 58 5.8 (=58/10) Mixed 82 8.2 1-14 78 7.8 15-24 127 12.7 >25 251 25.1 Cigarettes only (No. per day) 12 Nested Case-Control Studies • Sub-study that is based on an explicit cohort • Motivation: – Computational ease for large datasets – Require additional information not already collected • To reduce costs, a sample of subjects from the original cohort is taken Synonyms • Case-control-within-cohort studies • Incidence density sampling studies • Synthetic case-control studies • Case-control studies are also referred to as case-referent studies Incidence Density Sampling No. 1 2 3 4 5 6 7 8 Time for 1st failure Incidence Density Sampling Risk set for 2nd failure 1. For each failure time (T) of each case, define all subjects who at that time are still at risk of developing the outcome – – Risk set for 1st failure Time for 2nd failure The complete set of such subjects is called the risk set for the case Will exclude all subjects who before T were: • • Censored Failed 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time Incidence Density Sampling 2. Randomly select without replacement a sample of “controls” from the risk set • • • • These subjects are therefore “matched” to the case by time of event Other matching variables can be used so that the sampling is stratified; e.g., select only a random sample of women If a potential control eventually becomes a case, he is still at-risk at the time of the event A fixed number of controls can be selected; that number can vary from risk set to risk set Incidence Density Sampling 3. The analysis of these data is similar to the stratified analysis used in the M-H procedure for rates 4. The strata are now defined as each selected risk set. 13 Incidence Density Sampling 5. The measure of association is the odds ratio. With this sampling strategy and a matched analysis, it provides an unbiased estimate of the rate ratio. • A matched analysis is one that accounts explicitly for the matching during the fieldwork Incidence Density Sampling Incidence Density Sampling 6. The estimated OR will have more variability than the full M-H cohort analysis because fewer subjects are included 7. There is no need to calculate person-years in this analysis. It is subsumed automatically in the sampling. Examples 8. Odds ratios in each risk set are not calculated; rather a summary estimate across all risk sets is obtained. • This assumes that the rate ratio does not vary by time (proportional hazards assumption). Equivalently, the OR across strata (matched subjects) are ~ equal (homogeneous). 9. Only risk sets that are discordant on exposure contribute information Background • Stenting common Rx for CAD symptoms • Statin therapy improves survival in secondary prevention in conservatively treated patients • Is the same benefit present following stenting? 14 Methods • • • • 4,520 patients < 80 Examined 1 year mortality 3,585 with statins on discharge 935 no statins on discharge Typical RCT Results • • • • Mortality 2.6% statins, 5.6% no statins Unadjusted OR 0.46 (95% 0.33 – 0.65) Adjusted OR 0.51 (95% 0.36 – 0.71) Methods included propensity analysis for statin prescription and Cox PH model with a substantial number of clinical covariates So, what’s the problem? 51% reduction in mortality observed in 12 months 24% reduction in mortality observed in 72 months NEJM 1998;339:1349-57 Red Flag • If it looks too good to be true, it probably is too good to be true Potential Biases • Channeling (selection bias in pharmacoepi studies) • Misclassification (exposure is not time independent) 15 Person-time RR = 1/4 / 2/4 =.5 RR = 1/ 42 person-months / 2 /42 pm = 0.5 X 8 Statin = 2 / 37 Non statin = 1 / 47 X 8 RR = 2.4 7 Statin Group 7 Statin Group 6 5 5 X 4 No Statin Group 6 X 4 3 2 D/C @ 1 month - 11 months non-statin exposure X No Statin Group X 1 3 2 Start statin @ 1 month - 6 months -statin exposure X X 1 1 2 3 4 5 6 7 8 9 10 11 12 Time (Months) A different approach 1 2 3 4 5 6 7 8 9 10 11 12 Time (Months) Message • Vital to consider the time dependency of drug exposure • Another relatively easy method is to perform a nested case control study that matches on cohort entry • Assure equal follow-up time Results: Decrease in mortality of 34% 95%CI (4-55%) after 36 months) 16
© Copyright 2026 Paperzz