Cross-sectional studies

UE11 – Clinical Research –
Descriptive Studies - Lecture 3
9 mars 2016
Pierre-Yves ANCEL
([email protected])
RT : Romain Millot
Evrard-Florentin Ndiki Mayi
RL : Quentin Desormiere
Cross-sectional studies
Outline:
I-
Different types of studies
II-
Cross sectional studies
III-
Sampling
IV-
Sampling error
V-
Systematic error (or bias)
Learning objectives
1. List basic characteristics of cross-sectional studies
2. Learn to identify advantages and disadvantages of cross-sectional studies
3. List sampling difficulties and biases
I.
Different types of studies
In epidemiology there are different types of studies, we usually distinguish:
1) Experimental studies
– The Investigators:
• study the impact of a factor/exposure which they can control.
• try to control the environment in which the hypothesis is tested (the randomized,
double-blind clinical trial is the gold standard)
– In general they are limited to interventions that are believed to be of potential benefit. You
don’t assess the risk factor with this kind of studies.
For example, if you have a new treatment and you want to know if whether it’s effective or not,
you’re going to use this king of studies. Because at the same time you’ll control the exposure, and
the treatment. You’ll decide who will be treated and who won’t.
2) Observational studies
– The role of the investigator is to observe what happens, noting who is exposed or unexposed
and who has or has not developed the outcome of interest.
– The population is observed without any interference by the investigator
Among Observational studies we distinguish two types of studies:
• Descriptive studies
– They aim to describe/examine the distribution of a disease in a population, and observing
the basic features of its distribution in terms of time, place, and person.
• Analytical studies
– Testing a specific hypothesis about the relationship of a disease to a putative cause, by
conducting an epidemiologic study that relates the exposure of interest to the disease of
interest.
II.
Cross sectional studies
1) Definition
They are clearly descriptive studies, which aim to measure the prevalence of an exposure or
disease, at a point in time.
It measures prevalence of exposure or outcome at a “point” in time:
– Can measure attitudes, beliefs, behaviors, genetic factors, health conditions, or anything else
that does not require follow-up to assess
– Provides a “snapshot” of a population:
 Individuals included = those present at a point in time
 Collection of data: Outcome(s) and Exposure(s)
2) Study populations
We can distinguish:
• Cross-sectional: where only ONE set of observations is collected for every unit (every person
for example) in the study, at a certain point in time
From:
• Longitudinal: where TWO or MORE sets of observations are collected for every unit in the
study. Which means you have to repeat the collection of data for every unit.
3) Analysis of a cross sectional study
Definition of prevalence: it’s a percentage: where the numerator is the number of people with
the disease for example and the denominator is the number of person in the population. In a
cross sectional study, you can measure the prevalence of a disease, or the prevalence of an
exposure.
You also can study the prevalence of the study according to this exposure.
Example: study on the relationship between the prevalence of atherosclerosis and late life
depression syndromes. (Tiemeier et al. Arch Gen Psychiatry, 2004). Methods: Researchers
measured the prevalence of coronary artery calcification (atherosclerosis) and the prevalence of
depressive symptoms in a large sample of elderly men and women in Rotterdam (n=1920).
With the P(depression) and the P(atherosclerosis) we evaluate the prevalence, and with the
P(depression/atherosclerosis=1) and P(depression/atherosclerosis=0) we evaluate the
prevalence of depression according to atherosclerosis.
We can see that the prevalence of depression is more important when you have atherosclerosis,
but to conclude we have to do a statistical test. And to compare two percentages we should use
the law of Ki2 , to be sure there is a statistical difference between both.
The difference may not seem very important but if we calculate the risk ratio (5,2/3.8) it’s 1.4,
which means there is a 40% increase.
So the conclusion might be that atherosclerosis increase the risk of depression.
However, we have to think about the relationship between cause and effect.
We are not sure of which is the cause and which is the effect, and the is a common problem in
cross sectional studies.
Cross sectional studies
• Advantages
– Good for identifying prevalence of common outcomes
– Multiple outcomes and exposures can be studied
– Relatively quick and easy to conduct
• Limitations
– Difficult to determine whether the outcome followed exposure in time or
exposure resulted from the outcome,
 Associations identified may be difficult to interpret
– Prevalence is influenced by the incidence rate and the duration of disease
(persons who survive longer with their disease are more likely to be counted
in the numerator)
– Unable to measure incidence, since incidence is calculated over a period of
time.
– Not suitable for studying rare diseases or diseases with a short duration.
III.
Sampling
Example of a research question: Frequency of HBV among women aged 25 - 45 years old in Ile de
France region?
• Questions
– Variable of interest?
– Population?
– How to obtain the information about the variable?
We have to build a sample: because it would not be possible to work on the entire population
Why?
• Reasons: - Unable to study all members of a population - Measurements may be better in sample
than in entire population - Save time and money – improve the feasibility.
1) Definition
Sampling: The process of selecting units from a specific population to collect information on a
variable of interest
Sampling frame: List of all the sampling units from the eligible population e.g. all women aged 2545 y old…
Sampling unit: The basic unit (Subject under observation) from whom information is collected
Ex: Person: a women aged 25-45 y old // Group – household, school, district, etc.
Sampling fraction: Ratio between sample size and population size
We have two main sampling schemes (method of selecting sampling units from sampling frame):
1. Non-random sampling
• You choose who you think should be included in the study
• Cheaper- but likely to be biased, because you’ll choose people based on certain criteria.
2. Random sampling
• A “randomization” process for sample selection so that there is no specific treatment in selection
which may introduce selection bias
• The only method that allows valid conclusions about population
We generally prefer to use random sampling since it’s the only method that allows valid conclusions.
An example of random sampling is: Simple random sampling.
-Principle: Each sampling unit has an equal chance of being included in the sample.
-Procedure: Need listing of all sampling units (“sampling frame”).
-Advantages: Simple process and easy to understand / Easy calculation of means and variance.
-Disadvantage: Requires knowledge of the complete sampling frame.
Based on this table we selected those three persons, but when we have a larger population we use
computers to do the work and have a random selection.
The other way to do it is to use a non-random sampling where you choose who should be part of the
study. The probability of being chosen is then totally unknown! It’s cheaper to do but more likely to
be biased.
If we study a disease like tuberculosis, and we choose medicine student of paris Descartes, it will be
biased because this population does not represent the general population.
IV.
Sampling error
A- Sampling error
1)
Random error
No sample is an exact mirror image of the population: because of chance different samples will
produce different results. This must be taken into account when using a sample to make
inferences about a population.
The reasoning is as follow:
- Estimate (the sample) is different from the true underlying value (the population).
- It may result in either an under- or an overestimation of the true value.
- Its variability is measured by the standard error
The standard error can be measured. It depends on the size of the sample (standard error
decreases as the sample size increases).
2)
Illustration
Example 1
Objective:
We want to know the average height of men on earth. This average height exists but it is very
difficult to know it.
Method:
We measure hundreds or thousands of people and calculate the average height of these people.
Discussion:
The average height among these people is probably not exactly equal to the average height of
men on earth (because they are particular men in the whole population). But using a
representative sample of the population, it should be close enough. The difference between the
quantity that you want to know (average height of men on earth) and its estimation through
your sample (average height of men in the sample) is the sampling error.
Example 2
The prevalence of preterm birth is p0 = 6.3%
But the estimate could have been different in
another sample. No sample is an exact image of
the population  sampling error: random
variations of p0.
Accordingly, p0 = 6.3% is an indication, a pointestimate. We take a 95% confidence interval (= an
interval in which the probability of finding the true prevalence is 95%).
B- How to build a sample?
We want to assess the frequency of a specific disease in France. We have then a methodological
question: How many people should be included?
To determine the number of people to be included, we should estimate:
- The expected prevalence, e.g. p = 10%
- The precision around the prevalence, e.g. 10% ±2%
These numbers can be approximated thanks to existing research, or data. If you do not have any
idea of what the outcome of your study could be, you can try with different values.
V. Systematic error (or bias)
A- Representativeness
Sample should accurately represent the distribution of the characteristics of the population (age,
sex, urban vs rural, seasonality…). Representativeness is essential to generalise. Thus the
random selection of samples.
Representative sample
Characteristics of the sample are comparable
to the characteristics of the eligible
population.
Point estimates are consistent with expected
results in the population.
Non representative sample
Characteristics of the sample are different
from the characteristics of the eligible
population.
Point estimates are inconsistent with
expected results in the population.
B- Information bias
Information bias consists of a systematic problem in collecting information, mainly through
inaccurate measuring (problem of scales) or badly asked questions (ambiguity, not offering
rightful options).
Example: Have you ever smoked?
- Cigarettes, cigars…
- The one and only cigarette you smoked?
 a possible formulation of the question would be: Have you ever smoked as much as one
cigarette a day for as long as one year?
Bonus
Breast cancer
A cohort begins with 10,000 women. Of these, 500 have or have had breast cancer before the
study began. The remaining 9,500 women are followed up for five years, during which 250
breast cancer cases occur.
What is the prevalence of breast cancer at the beginning of the study?
500/10000 (number of women with breast cancer at the beginning/sample population)
What is the five-year incidence risk of breast cancer in the cohort?
250/9500 (number of women with breast cancer occurring during the study/number of women
without cancer at the beginning of the study).
FICHE RECAPITULATIVE
Cross sectional studies:
It measures prevalence of exposure or outcome at a “point” in time
• Cross-sectional: only ONE set of observations is
• Longitudinal: TWO or MORE sets of observations are collected
Prevalence of a disease and an exposure:
Prevalence of a disease according to the exposure:
Important+++:
• Advantages
– Good for identifying prevalence of common outcomes
– Multiple outcomes and exposures can be studied
– Relatively quick and easy to conduct
• Limitations
– Difficult to determine whether the outcome followed exposure in time or exposure resulted
from the outcome,
 Associations identified may be difficult to interpret
– Prevalence is influenced by the incidence rate and the duration of disease (persons who
survive longer with their disease are more likely to be counted in the numerator)
– Unable to measure incidence, since incidence is calculated over a period of time.
- Not suitable for studying rare diseases or diseases with a short duration.
Sampling:
The process of selecting units from a specific population to collect information on a variable of
interest
Non-random sampling
Biased, because you’ll choose people based on certain criteria.
Random sampling
The only method that allows valid conclusions about population
Sampling error:
-Variability is measured by the standard error: standard error decreases as the sample size
increases.
-The difference between the quantity that you want to know and its estimation through your
sample is the sampling error.
Build a sample:
Systematic error (or bias):
Representativeness is essential to generalise, so you need that characteristics of the sample are
comparable to the characteristics of the eligible population.
Information bias:
Information bias consists of a systematic problem in collecting information, mainly through
inaccurate measuring or badly asked questions.