UE11 – Clinical Research – Descriptive Studies - Lecture 3 9 mars 2016 Pierre-Yves ANCEL ([email protected]) RT : Romain Millot Evrard-Florentin Ndiki Mayi RL : Quentin Desormiere Cross-sectional studies Outline: I- Different types of studies II- Cross sectional studies III- Sampling IV- Sampling error V- Systematic error (or bias) Learning objectives 1. List basic characteristics of cross-sectional studies 2. Learn to identify advantages and disadvantages of cross-sectional studies 3. List sampling difficulties and biases I. Different types of studies In epidemiology there are different types of studies, we usually distinguish: 1) Experimental studies – The Investigators: • study the impact of a factor/exposure which they can control. • try to control the environment in which the hypothesis is tested (the randomized, double-blind clinical trial is the gold standard) – In general they are limited to interventions that are believed to be of potential benefit. You don’t assess the risk factor with this kind of studies. For example, if you have a new treatment and you want to know if whether it’s effective or not, you’re going to use this king of studies. Because at the same time you’ll control the exposure, and the treatment. You’ll decide who will be treated and who won’t. 2) Observational studies – The role of the investigator is to observe what happens, noting who is exposed or unexposed and who has or has not developed the outcome of interest. – The population is observed without any interference by the investigator Among Observational studies we distinguish two types of studies: • Descriptive studies – They aim to describe/examine the distribution of a disease in a population, and observing the basic features of its distribution in terms of time, place, and person. • Analytical studies – Testing a specific hypothesis about the relationship of a disease to a putative cause, by conducting an epidemiologic study that relates the exposure of interest to the disease of interest. II. Cross sectional studies 1) Definition They are clearly descriptive studies, which aim to measure the prevalence of an exposure or disease, at a point in time. It measures prevalence of exposure or outcome at a “point” in time: – Can measure attitudes, beliefs, behaviors, genetic factors, health conditions, or anything else that does not require follow-up to assess – Provides a “snapshot” of a population: Individuals included = those present at a point in time Collection of data: Outcome(s) and Exposure(s) 2) Study populations We can distinguish: • Cross-sectional: where only ONE set of observations is collected for every unit (every person for example) in the study, at a certain point in time From: • Longitudinal: where TWO or MORE sets of observations are collected for every unit in the study. Which means you have to repeat the collection of data for every unit. 3) Analysis of a cross sectional study Definition of prevalence: it’s a percentage: where the numerator is the number of people with the disease for example and the denominator is the number of person in the population. In a cross sectional study, you can measure the prevalence of a disease, or the prevalence of an exposure. You also can study the prevalence of the study according to this exposure. Example: study on the relationship between the prevalence of atherosclerosis and late life depression syndromes. (Tiemeier et al. Arch Gen Psychiatry, 2004). Methods: Researchers measured the prevalence of coronary artery calcification (atherosclerosis) and the prevalence of depressive symptoms in a large sample of elderly men and women in Rotterdam (n=1920). With the P(depression) and the P(atherosclerosis) we evaluate the prevalence, and with the P(depression/atherosclerosis=1) and P(depression/atherosclerosis=0) we evaluate the prevalence of depression according to atherosclerosis. We can see that the prevalence of depression is more important when you have atherosclerosis, but to conclude we have to do a statistical test. And to compare two percentages we should use the law of Ki2 , to be sure there is a statistical difference between both. The difference may not seem very important but if we calculate the risk ratio (5,2/3.8) it’s 1.4, which means there is a 40% increase. So the conclusion might be that atherosclerosis increase the risk of depression. However, we have to think about the relationship between cause and effect. We are not sure of which is the cause and which is the effect, and the is a common problem in cross sectional studies. Cross sectional studies • Advantages – Good for identifying prevalence of common outcomes – Multiple outcomes and exposures can be studied – Relatively quick and easy to conduct • Limitations – Difficult to determine whether the outcome followed exposure in time or exposure resulted from the outcome, Associations identified may be difficult to interpret – Prevalence is influenced by the incidence rate and the duration of disease (persons who survive longer with their disease are more likely to be counted in the numerator) – Unable to measure incidence, since incidence is calculated over a period of time. – Not suitable for studying rare diseases or diseases with a short duration. III. Sampling Example of a research question: Frequency of HBV among women aged 25 - 45 years old in Ile de France region? • Questions – Variable of interest? – Population? – How to obtain the information about the variable? We have to build a sample: because it would not be possible to work on the entire population Why? • Reasons: - Unable to study all members of a population - Measurements may be better in sample than in entire population - Save time and money – improve the feasibility. 1) Definition Sampling: The process of selecting units from a specific population to collect information on a variable of interest Sampling frame: List of all the sampling units from the eligible population e.g. all women aged 2545 y old… Sampling unit: The basic unit (Subject under observation) from whom information is collected Ex: Person: a women aged 25-45 y old // Group – household, school, district, etc. Sampling fraction: Ratio between sample size and population size We have two main sampling schemes (method of selecting sampling units from sampling frame): 1. Non-random sampling • You choose who you think should be included in the study • Cheaper- but likely to be biased, because you’ll choose people based on certain criteria. 2. Random sampling • A “randomization” process for sample selection so that there is no specific treatment in selection which may introduce selection bias • The only method that allows valid conclusions about population We generally prefer to use random sampling since it’s the only method that allows valid conclusions. An example of random sampling is: Simple random sampling. -Principle: Each sampling unit has an equal chance of being included in the sample. -Procedure: Need listing of all sampling units (“sampling frame”). -Advantages: Simple process and easy to understand / Easy calculation of means and variance. -Disadvantage: Requires knowledge of the complete sampling frame. Based on this table we selected those three persons, but when we have a larger population we use computers to do the work and have a random selection. The other way to do it is to use a non-random sampling where you choose who should be part of the study. The probability of being chosen is then totally unknown! It’s cheaper to do but more likely to be biased. If we study a disease like tuberculosis, and we choose medicine student of paris Descartes, it will be biased because this population does not represent the general population. IV. Sampling error A- Sampling error 1) Random error No sample is an exact mirror image of the population: because of chance different samples will produce different results. This must be taken into account when using a sample to make inferences about a population. The reasoning is as follow: - Estimate (the sample) is different from the true underlying value (the population). - It may result in either an under- or an overestimation of the true value. - Its variability is measured by the standard error The standard error can be measured. It depends on the size of the sample (standard error decreases as the sample size increases). 2) Illustration Example 1 Objective: We want to know the average height of men on earth. This average height exists but it is very difficult to know it. Method: We measure hundreds or thousands of people and calculate the average height of these people. Discussion: The average height among these people is probably not exactly equal to the average height of men on earth (because they are particular men in the whole population). But using a representative sample of the population, it should be close enough. The difference between the quantity that you want to know (average height of men on earth) and its estimation through your sample (average height of men in the sample) is the sampling error. Example 2 The prevalence of preterm birth is p0 = 6.3% But the estimate could have been different in another sample. No sample is an exact image of the population sampling error: random variations of p0. Accordingly, p0 = 6.3% is an indication, a pointestimate. We take a 95% confidence interval (= an interval in which the probability of finding the true prevalence is 95%). B- How to build a sample? We want to assess the frequency of a specific disease in France. We have then a methodological question: How many people should be included? To determine the number of people to be included, we should estimate: - The expected prevalence, e.g. p = 10% - The precision around the prevalence, e.g. 10% ±2% These numbers can be approximated thanks to existing research, or data. If you do not have any idea of what the outcome of your study could be, you can try with different values. V. Systematic error (or bias) A- Representativeness Sample should accurately represent the distribution of the characteristics of the population (age, sex, urban vs rural, seasonality…). Representativeness is essential to generalise. Thus the random selection of samples. Representative sample Characteristics of the sample are comparable to the characteristics of the eligible population. Point estimates are consistent with expected results in the population. Non representative sample Characteristics of the sample are different from the characteristics of the eligible population. Point estimates are inconsistent with expected results in the population. B- Information bias Information bias consists of a systematic problem in collecting information, mainly through inaccurate measuring (problem of scales) or badly asked questions (ambiguity, not offering rightful options). Example: Have you ever smoked? - Cigarettes, cigars… - The one and only cigarette you smoked? a possible formulation of the question would be: Have you ever smoked as much as one cigarette a day for as long as one year? Bonus Breast cancer A cohort begins with 10,000 women. Of these, 500 have or have had breast cancer before the study began. The remaining 9,500 women are followed up for five years, during which 250 breast cancer cases occur. What is the prevalence of breast cancer at the beginning of the study? 500/10000 (number of women with breast cancer at the beginning/sample population) What is the five-year incidence risk of breast cancer in the cohort? 250/9500 (number of women with breast cancer occurring during the study/number of women without cancer at the beginning of the study). FICHE RECAPITULATIVE Cross sectional studies: It measures prevalence of exposure or outcome at a “point” in time • Cross-sectional: only ONE set of observations is • Longitudinal: TWO or MORE sets of observations are collected Prevalence of a disease and an exposure: Prevalence of a disease according to the exposure: Important+++: • Advantages – Good for identifying prevalence of common outcomes – Multiple outcomes and exposures can be studied – Relatively quick and easy to conduct • Limitations – Difficult to determine whether the outcome followed exposure in time or exposure resulted from the outcome, Associations identified may be difficult to interpret – Prevalence is influenced by the incidence rate and the duration of disease (persons who survive longer with their disease are more likely to be counted in the numerator) – Unable to measure incidence, since incidence is calculated over a period of time. - Not suitable for studying rare diseases or diseases with a short duration. Sampling: The process of selecting units from a specific population to collect information on a variable of interest Non-random sampling Biased, because you’ll choose people based on certain criteria. Random sampling The only method that allows valid conclusions about population Sampling error: -Variability is measured by the standard error: standard error decreases as the sample size increases. -The difference between the quantity that you want to know and its estimation through your sample is the sampling error. Build a sample: Systematic error (or bias): Representativeness is essential to generalise, so you need that characteristics of the sample are comparable to the characteristics of the eligible population. Information bias: Information bias consists of a systematic problem in collecting information, mainly through inaccurate measuring or badly asked questions.
© Copyright 2026 Paperzz