Estimating the Allele Frequency of Autosomal Recessive Disorders

doi: 10.1111/j.1469-1809.2011.00693.x
Estimating the Allele Frequency of Autosomal Recessive
Disorders through Mutational Records and Consanguinity:
The Homozygosity Index (HI)
Alessandro Gialluisi1 , Tommaso Pippucci1 , Yair Anikster2 , Ugur Ozbek3 , Myrna Medlej-Hashim4,5 ,
Andre Mégarbané4 and Giovanni Romeo1∗
1
Unità Operativa di Genetica Medica, Dipartimento di Scienze Ginecologiche, Ostetriche e Pediatriche, Policlinico Sant’Orsola Malpighi,
Università di Bologna, Bologna, Italy
2
Metabolic Disease Unit, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, Tel Hashomer, Israel
3
Genetics Department, Institute for Experimental Medicine, Istanbul University, Istanbul, Turkey
4
Unité de Génétique Médicale and Inserm, Laboratoire International Associé à l’UMR_S 910, Université Saint Joseph, Beirut, Lebanon
5
Department of Life and Earth Sciences, Faculty of Sciences, Branch II, Lebanese University, Beirut, Lebanon
Summary
In principle mutational records make it possible to estimate frequencies of disease alleles (q) for autosomal recessive
disorders using a novel approach based on the calculation of the Homozygosity Index (HI), i.e., the proportion of
homozygous patients, which is complementary to the proportion of compound heterozygous patients P(CH). In other
words, the rarer the disorder, the higher will be the HI and the lower will be the P(CH). To test this hypothesis we used
mutational records of individuals affected with Familial Mediterranean Fever (FMF) and Phenylketonuria (PKU), born to
either consanguineous or apparently unrelated parents from six population samples of the Mediterranean region. Despite
the unavailability of precise values of the inbreeding coefficient for the general population, which are needed in the case
of apparently unrelated parents, our estimates of q are very similar to those of previous descriptive epidemiological studies.
Finally, we inferred from simulation studies that the minimum sample size needed to use this approach is 25 patients
either with unrelated or first cousin parents. These results show that the HI can be used to produce a ranking order of
allele frequencies of autosomal recessive disorders, especially in populations with high rates of consanguineous marriages.
Keywords: Inbreeding, disease allele frequency, autosomal recessive disorders, homozygosity index
Introduction
Studies of consanguinity have taken advantage of the relationship between the frequency of a given autosomal recessive
disorder and the proportion of offspring of consanguineous
couples affected with the same disorder. If the inbreeding coefficient between parents is known, this approach allows one
to calculate the frequency of autosomal recessive disorders in a
way that is free of the problems arising from missed diagnoses
(incomplete ascertainment). This requires first of all a fairly
∗
Corresponding author: Giovanni Romeo, Unità Operativa di
Genetica Medica, Dipartimento di Scienze Ginecologiche,
Ostetriche e Pediatriche, Policlinico Sant’Orsola Malpighi, Università di Bologna, Via Massarenti 9, 40136 Bologna, Italy. Tel: 0039 051
2088420; Fax: 0039 051 5870611; E-mail: [email protected]
C
precise computation of the inbreeding coefficient (F) in the
general population. In classical studies of consanguinity, the
gene frequency (q), and thus the prevalence (P) of autosomal
recessive conditions has been inferred from the ratio between
the frequency of consanguineous parents of children affected
with a specific autosomal recessive disorder and the frequency
of consanguineous couples of the same degree of kinship in
the general population. This type of approach has made use
of mathematical formulas worked out by population geneticists like Dahlberg (1948) and has made it possible to estimate
the prevalence, among others, of Phenylketonuria, Friedrich
ataxia, and Cystic Fibrosis in Italy (Romeo et al., 1983a, b,
1985).
The problem always encountered by this type of genetic epidemiology studies is represented by the difficulty to
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2012) 76,159–167
159
A. Gialluisi et al.
obtain a reliable estimate of the frequency of consanguineous
couples in the general population for any given degree of relationship. In Italy this estimate has been made possible up to
1964 by the availability of centralized archives of the Catholic
Church, which kept records in Rome of all the dispensations
(or permits to marry in Church) awarded by the Pope for consanguineous marriages celebrated in Italy during a period of
almost 400 years. These demographic data were collected and
organized by Cavalli-Sforza et al. (2004). The absence of an
equivalent centralized archive of consanguinity in countries
other than Italy has hindered the use of this epidemiological
approach.
We utilized, therefore a novel approach to estimate the
gene frequency for rare autosomal recessive disorders using
mutational records of patients. This approach is based on the
possibility that the affected child can carry a double copy of the
same causative mutation (homozygosity) or alternatively two
different causative mutations in the same gene (compound
heterozygosity). Although in the former case the mutant alleles can be identical by descent (IBD) or by state (IBS), in
the latter case the two mutations must have been inherited
through two different ancestors even if the parents are consanguineous, which implies that the two mutant alleles are
not IBD. The proportion of compound heterozygotes among
children affected with a given autosomal recessive disorder
is dependent not only on the frequency of consanguineous
marriages but also on the relative frequency of the different
pathogenic alleles. The theoretical basis of this approach has
been discussed in a recent paper by ten Kate et al. (2010),
starting from the relationship between the frequency of compound heterozygotes among the patients affected with a given
autosomal recessive condition (P(CH)) and the frequency of
the relevant disease allele in the general population (q). In
their paper ten Kate al. introduced theoretical calculations
to demonstrate how q is positively correlated with P(CH)
and with the inbreeding coefficient (F) among the probands,
thus generating an additional tool to infer the frequency of
autosomal recessive diseases. In this paper we demonstrate
the feasibility of the Homozygosity Index (HI) approach using
mutation data from different Mediterranean countries where
consanguinity rates are high.
Materials and Methods
Diseases and Patients
We used mutation datasets of individuals affected with two different autosomal recessive disorders, namely Familial Mediterranean Fever (FMF, OMIM #249100) and Phenylketonuria
(PKU, OMIM #261600).
160
Annals of Human Genetics (2012) 76,159–167
FMF is an autosomal recessive inflammatory disorder affecting people originating from areas around the
Mediterranean Sea basin, due to mutations in the MEFV
(MEditerranean FeVer) gene (Medlej-Hashim et al., 2005;
Mattit et al., 2006). More than 70 MEFV pathogenic mutations have been identified, the five most common (representing 80% of the total) being M694V, V726A, M694I, M680I in
exon 10, and E148Q in exon 2 (Medlej-Hashim et al., 2005;
El Shanti et al., 2006; Yilmaz et al., 2009; Moradian et al.,
2010).
The symptoms and severity of the inflammation vary depending on the type and number of mutations in the MEFV
gene (M694V being the most severe and penetrant) which
can be found in the heterozygous, homozygous, or compound heterozygous state in different patients (Mattit et al.,
2006; Moradian et al., 2010). Furthermore, it is not infrequent to find patients carrying three or more mutations and
patients carrying complex alleles (Medlej-Hashim et al., 2005;
Moradian et al., 2010). MEFV mutations in the homozygous
or compound heterozygous state are likely to determine the
most severe forms of FMF. However, since such genotypes are
detected in only 41–76% of patients, it has been hypothesized
that regulatory mutations may go undetected (Yilmaz et al.,
2009) or, alternatively, that unknown modifier genes and/or
environmental factors may affect the expression of the disease
(Touitou, 2001).
Moreover, the FMF carrier rate can be as high as one
in three in some ethnic groups (like Armenians), a finding
which in turn raises the possibility of a selective heterozygote
advantage. (El Shanti et al., 2006).
PKU is one of the most common inborn errors of amino
acid metabolism and by far (98%) the most frequent form of
HyperPhenylAlaninemia, a group of diseases characterized by
the persistent elevation of phenylalanine levels in tissues and
biological fluids (Zare-Karizi et al., 2010). PKU is due to deficiency in phenylalanine hydroxylase. This enzyme is coded
by the PAH gene, which consists of 13 exons (Zare-Karizi
et al., 2010). So far, more than 500 different mutations have
been identified and described in PAH, with various phenotypic consequences (Santos et al., 2010). Most of them are
point mutations and microdeletions, usually localized to the
coding region or the intron–exon boundaries of the gene,
mainlyin its 3’ region (Berchovich et al., 2008b). The number of different mutations in a given population is usually
high, with a few prevalent mutations and a large number of
private mutations (Berchovich et al., 2008a). This results in a
high number of compound heterozygous affected individuals
(Santos et al., 2010). Several studies indicate that the prevalence of the disorder and the spectrum of mutations differ
among populations (Berchovich et al., 2008a, b; Zare-Karizi
et al., 2010).
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics The Homozygosity Index Method
Collection of Mutation Data for Autosomal
Recessive Disorders from Offspring of
Consanguineous and Nonconsanguineous
Parents
pound heterozygotes (P(CH)), and the proportion of compound heterozygotes among non-IBD genotypes (R(CH))
q =
Mutational records were obtained from the following diagnostic laboratories: Medical Genetics Unit, St. Joseph
University, Beirut, Lebanon; Genetics Department, Institute for Experimental Medicine, Istanbul University,
Istanbul, Turkey (for FMF) and Metabolic Disease Unit, Edmon and Lily Safra Children’s Hospital, Sheba Medical Center, Tel Hashomer, Israel (for PKU). Drawings of patients’
pedigrees were collected only if they were born to consanguineous parents and only the offsprings of first cousins were
included in the consanguineous group for the purpose of this
work. Closer consanguineous relationships were not observed
in the sample, whereas the offspring of more distantly related
cousins and/or with small sample size were not included at all
in the study (five FMF patients born to second cousins and
11 born to third cousins or more distantly related individuals in the Lebanese sample; five PKU patients born to first
cousins, one born to second cousins and one born to individuals whose degree of kinship was not well defined in the
Israeli Jews sample; four PKU patients born to second cousins,
one born to third cousins and three patients born to parents
whose degree of kinship was not well defined in the Israeli
Arab sample). Written informed consent was available for every patient. Only homozygous or compound heterozygous
genotypes for alleles previously reported as disease associated
(Touitou, 2001; Medlej-Hashim et al., 2005; Mattit et al.,
2006; Berchovich et al., 2008a, b) were considered in the
allele count.
Assuming that: (1) we have a set of genotypes from individuals
affected with an autosomal recessive disease; (2) every genotype is from an individual of the population with a known
inbreeding coefficient F; (3) genotypes are only homozygous
or compound heterozygous for the disease alleles of a single
gene; (4) these alleles are identified only by a single diseaseassociated variant (i.e., different haplotypes of the same variant
are not considered as different alleles); and (5) they strictly act
in a recessive manner (no phenotypic effect on heterozygotes);
then we can calculate the HIas the number of homozygotes
(HOM) over the total of homozygous and compound heterozygous (CH) genotypes (i.e., the total number of patients):
(1)
The equation introduced by ten Kate et al. (2010) derives
q from the inbreeding coefficient (F), the proportion of com
C
being
R(CH) = 1 − R(HN) = 1 −
i
(2)
q i 2 , [i = 0, 1, 2, . . . , n],
(3)
where R(HN) is the proportion of IBS genotypes among
non-IBD genotypesand qi is the relative frequency of the ith
disease allele (with qi = 1).
The above equation (2) can be also expressed as
q =
[P (CH) × F ]
.
[R(CH) − P (CH)] × (1 − F )
(4)
When homozygous genotypes are broken down into Homozygous IBD (IBD) and non-IBD (HN), HI can be written
as
HI =
P (HN) + P (IBD)
.
P (HN) + P (IBD) + P (CH)
(5)
The proportion of non-IBD homozygotes can be calculated as
P (HN) =
[R(HN) × (1 − F ) × q 2 ]
[(F × q ) + (1 − F ) × q 2 ]
2
q i × (1 − F ) × q
i
=
[F + (1 − F ) × q ]
(ten Kate et al., 2010).
(6)
According to Li (1955), the proportion of homozygotes
IBD among the offspring of consanguineous couples is
Estimation of q from the Proportion of
Homozygous Genotypes
HI = HOM/(HOM + CH).
[P (CH) × (F + q − (F × q ))]
,
[(1 − F ) × R(CH)]
P (IBD) = F × q
(7)
And, therefore, P(CH) can be expressed as
P (C H) =
[R(CH) × (1 − F ) × q 2 ]
[(F × q ) + (1 − F ) × q 2 ]
2
q i × (1 − F ) × q
1−
i
=
[(F + (1 − F ) × q ]
(ten Kate et al., 2010).
(8)
From (5) to (8) we can conclude that
q i 2 × (1 − F ) × q + F
HI =
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics i
[((1 − F ) × q ) + F ]
,
Annals of Human Genetics (2012) 76,159–167
(9)
161
A. Gialluisi et al.
and q can be expressed as a function of HI
[F × (1 − HI)]
.
q i 2 × (1 − F )
HI −
q = (10)
i
It is evident that, in the studied population (i.e., under the
above mentioned assumptions), HI = 1-P(CH).
In addition to samples of patients born to related parents,
it should be possible in theory to apply this formula also to
samples of patients born to apparently unrelated parents, when
F for that specific population is known.
Validation of the Estimates of q Using
Equation (10)
To assess whether our estimates of q were reliable, we collected
information about disease frequencies in the same populations from which the mutational records were obtained. Data
on disease prevalence, disease allele frequencies, and the total disease allele frequency were obtained from traditional
epidemiological and mutational reports (Ozen et al., 1998;
Dinc et al., 2000; Mattit et al., 2006; ; Berchovich et al.,
2008a) and from the collaborating diagnostic laboratories (see
Table 1).
Sample Size Requirements for Estimating q
We simulated a series of populations of 1000 genotypes with
three pathogenic alleles. Different frequencies of the main
pathogenic allele were set, with q1 ranging from 0.4 to 0.8.
For every q1 value we built two populations: one with F =
0.0625 and one with F = 0.001. We determined, therefore
the genotype distribution among the 1000 affected individuals
as a function of q; q1 ; q2 ; q3 ; HI and F (through the Sewall
Wright’s F-statistics), by means of a random-based model built
R
worksheet. Among the populations created, we
in an Excel
chose those with realistic values of q (ranging between q =
0.1 and q = 0.002).
Then, we randomly extracted from every set of 1000 genotypes 100 samples of n genotypes (with n = 10, 25, 35, 50, 75,
100) using a custom Perl script. For each sample, we could
estimate q using Equation (10). To simplify calculations, we
considered only q1 in q computation (i.e., we put R(HN) =
q1 2 ), given that this does not significantly affect the estimate of
q (as explained in the Theoretical model and simulations section).
We therefore calculated, for each sample size, the Confidence
Interval of q with α = 0.05 (CI95% ), thus producing a reliable
index of the accuracy of q estimates in the population (see
Online Supplementary Material S1 for more details). As expected, CI95% generally shrinks as the sample size (n) increases
and q of the population (qpop ) always falls within this range or
is very close to it. Slight inconsistencies are probably due to
random variation and outlier values that occasionally appear
in the samplings.
Results
Theoretical Model and Simulations
We investigated the relationships among the different variables
in the model through simulation. We observed a positive correlation between q and q1 (the relative allelic frequency of the
major pathogenic allele). More specifically, q1 affects q more
than the other pathogenic alleles, which are increasingly irrelevant as their relative frequencies decrease (Fig. 1). Indeed,
if we replace R(HN) with q1 2 in (10; so that we consider
only q1 in q computation), this does not significantly affect
the estimate of q, as shown in Fig. 1. Moreover, q is inversely
correlated to R(HN) and HI (see Fig. 2), which is in perfect
agreement with the general postulate that the rarer the disorder, the higher the frequency of homozygotes among affected
individuals.
HI is also in positive correlation with q1 , whose magnitude is also affected by F, which is in direct correlation with
N. (relationship
between parents)
Gene
Disease
Country
Center
MEFV
FMF
Lebanon
St. Joseph University, Beirut
MEFV
FMF
Turkey
PAH
PKU
Israel
Institute of Experimental Medicine,
Istanbul
Metabolic Disease Unit, Sheba
Medical Center, Tel Hashomer
34 (1Ca )
107 (URb )
55 (URb )
30 (URb )
8 (1Ca )
87 (URb )
Table 1 Patients studied and relative disorders, countries of origin and centers in
which they were tested, along with the
sample sizes.
a
Born to first cousins.
Born to apparently unrelated parents. Among the PKU patients, the 87 UR were Israeli
Jews and the rest Israeli Arabs.
b
162
Annals of Human Genetics (2012) 76,159–167
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics The Homozygosity Index Method
Figure 1 The R(HN)-based (qR(HN) ) vs. q1 2 -based q (qq1 ) plot (with F = 0.0625)
clearly shows how q2,...,n−1,n are increasingly irrelevant in the estimate of q. Indeed,
while the difference qR(HN) –qq1 slightly increases, the (qR(HN) -qq1 )/qR(HN) ratio decreases
for increasing q1 (data not shown).
Figure 2 q vs. HI and q vs. R(HN) plots (with F = 0.0625). In both graphics q exhibits a negative
exponential-like decrease as HI/R(HN) increases.
HI. In other words, the higher the inbreeding coefficient between the parents of the probands, the higher will be the
probability of a single mutation occurring in homozygosity.
This is clear from Figure 3, which illustrates the range of
values of HI (maximum/minimum HI) versus q1 , in subjects
born from consanguineous and nonconsanguineous couples.
Figure 3 also suggests that, should we encounter a population
with a strikingly prevalent pathogenic allele, the differences in
HI between a hypothetical sample of probands born to first
cousins and one of probands born to unrelated individuals
would be very small.
HI is directly proportional to F (as confirmed by Sewall
Wright’s F estimation of heterozygous individuals in a population) and positively correlated with q1 ; therefore, for a fixed
value of q, q1 will increase as F of the sample decreases (to
keep q constant, see Fig. 4).
C
Application of Equation (10) to Real Sample
Data Sets
As summarized in Table 2, we tested six samples of patients
for whom we knew individual genotypes and degree of relationship between parents.
The mutational spectra for each of the six samples, with
the relative values of HI (genotype not shown) and all the
mutational data available are summarized in (Fig. S1a–f).
Because of the difficulties in estimating the inbreeding coefficient for the populations taken into consideration, we had
to set approximate values of F for the samples made up of
probands born to unrelated subjects. In some cases (such as
the Lebanese and Turkish sample) we used the values indicated
by the collaborating diagnostic laboratories. Whenever possible we chose F values among those published and/or reported
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2012) 76,159–167
163
A. Gialluisi et al.
Figure 3 Maximum/minimum HI estimates vs. q1 plot, for simulated affected
individuals born to first cousins (1C) and unrelated (UR) parents (for fixed q = 0.1 in a
hypothetical three- allele system). For a given value of q1 , we set q2 = 1–q1 and q3 = 0
to compute HI max (i.e., to maximize HI) and q2 = q3 = (1–q1 )/2 to compute HI min
(i.e., to minimize HI). Both HI max plots exhibit an initial small decrease and an
exponential-like rise thereafter. On the contrary, both HI min show a slightly increasing
trend. All of the four curves hit a peak at HI = 1 (the maximum theoretical HI value
possible), thus reducing the range of variation of HI as q1 increases.
Figure 4 q1 vs. F plot, for fixed q = {0.05; 0.1; 0.15; 0.2}. A positive logarithm-like correlation between
the considered variables is noticed. Indeed, it is necessary for q1 to rise to counterbalance the F decrease, to
keep q constant. The trend remains the same for different values of q, on different scales (with q1 growing
with increasing q).
in the http://www.consang.net website or computed them as
a mean of the data reported on the above mentioned website
(see Table S1). More specifically, we tried to select the most
reliable values, with special regard to data relative to specific
ethnic groups (namely Israeli Arabs and Jews samples).
frequencies will be, with the upper limit of q1 rarely exceeding
0.8. With regard to the lower limit of the interval, hypothesizing q1 ≤ 0.4 in a three alleles system with q1 increments of
0.1 becomes internally contradictory.
Discussion
Sample Size Requirements for Estimating q
Our simulations indicate that for a sample of 25 patients showing three different alleles, fairly precise and reliable estimates
of q can be obtained with allele frequency of 0.4 ≤ q1 ≤ 0.8
(Fig. S2a, b). We decided to study q variation within the range
q1 = (0.4–0.8) for two main reasons. Indeed, whatever the F,
in real populations we usually expect to find more than three
pathogenic alleles. The first implication is that the greater the
number of alleles, the most balanced the ratio among allele
164
Annals of Human Genetics (2012) 76,159–167
This work is based on the estimate of disease-allele frequencies
(therefore of the prevalence of the corresponding autosomal
recessive disorders) in the general population, relying only
on mutational records. We believe that these records, if used
extensively, will generate in the future a useful epidemiological picture of the frequency of autosomal recessive disorders
in different populations. The estimate of HI is directly
dependent on the relative frequencies of the pathogenic
alleles and inversely correlates with the global frequency of
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics The Homozygosity Index Method
Table 2 Comparison between total allele frequency (q)/prevalence (P) estimated by the present method (upper line) and those previously
calculated by traditional methods (lower line of the last two columns).
Gene
Country
F
MEFV
Lebanon
0.0625
MEFV
Lebanon
MEFV
Turkey
PAH
Israel
(Arabs)
Israel
(Arabs)
Israel
(Jews)
PAH
PAH
Sample size
HI
q
P
34
0.5588
Unrelateda
0.0127
Unrelateda
0.005
Unrelateda
0.0175
0.0625
107
0.3458
55
0.4545
30
0.5667
Unrelateda
0.0022
87
0.123
(0.095d )
0.09
(0.095d )
0.038
(c )
0.017
(c )
0.0126
(c )
0.011
(c )
1:66
(c )
1:123
(c )
1:691
(1:1075e -1:1000f )
1:3501
(1:8200g , among Non-Jews)
1:6344
(1:8200g , among Non-Jews)
1:8046
(1:12500g )
8b
0.875
0.2299
a
F arbitrarily chosen/calculated, relying on previously published inbreeding data, see Tables of the global prevalence of consanguinity,
http://www.consang.net/index.php/Global_prevalence_tables and Supplementary Table S1.
b
Further mutational data from the Israeli-Arab population are being collected to confirm this preliminary estimate.
c
Data not available.
d
Mattit et al., 2006.
e
Ozen et al, 1998.
f
Dinc et al, 2000.
g
Berchovich et al., 2008a.
these alleles in the general population (q). Therefore, HI can
link together all the variables considered so far in the genetic
epidemiology of autosomal recessive disorders, namely q,
q1,2,..,n−1,n and F.
The results based on the HI approach (Table 2) are in agreement with previous estimates of FMF and PKU prevalence
in the populations examined, which were based on descriptive epidemiological studies and on much larger samples of
patients (see Table 2 and Ozen et al., 1998; Dinc et al., 2000;
Mattit et al., 2006, Berchovich et al., 2008a).
This holds generally true also for the PKU Israeli Arabs
samples, where the fact that the q/P estimate is higher in
the sample of patients born to unrelated parents than among
patients born to first cousins should not surprise, as the q/P
value calculated through this approach always refers to the
general population investigated and not to the specific sample. In other words, these two prevalences represent two point
estimates of the same population parameter, not two different parameters characteristic of the samples. This also applies
to the FMF Lebanon samples. However, for a given disorder
in a given population, it is theoretically possible that a sample of patients born to unrelated parents gives a q/P greater
than a sample of patients born to consanguineous ones: it
would be sufficient that q1 (the relative frequency of the main
pathogenic allele) is so big as to overweight even a small F in
the “unrelated” sample or, conversely, that q1 is so small as to
counterbalance the effect of a high F in the “related” sample
(as in this case, see Fig. S1d). Although this is very unusual
C
in large samples, for very small samples like our Israeli Arab
sample it can happen because qi fluctuation is notably affected
by single genotypes.
The use of F for the group of apparently unrelated parents
is the weak point of this approach, due to the unavailability of
accurate F estimates for the populations examined. Although
for the first cousins samples we can rely on the estimate (F
= 0.0625) based on pedigree reconstruction-despite evidence
that this might be underestimated (Woods et al., 2006)– the
estimate of F for the samples with apparently unrelated parents based on demographic data is not equally reliable. Several
studies have tried to infer F in some populations using different experimental methods. The most reliable ones seem
to be those based on the measurement and count of Runs
of Homozygosity (ROH) with a given minimal length in
the genome (Carothers et al., 2006; McQuillan et al., 2008;
Polašek et al., 2010). Alternatively, a novel statistical method
to estimate the length of ROH (thus the inbreeding coefficient), relying on a maximum-likelihood approach based on
a Hidden Markov Model, has been proposed (Leutenegger
et al., 2003). Further research on the estimate of F is needed.
Another possible limitation could be the relatively small
amount of probands that is possible to collect in each population, because autosomal recessive disorders are usually rare
conditions and probands born to consanguineous parents are
even rarer. It is interesting that our sample size analysis suggests that we need only 25 patients to ensure a reliable estimate of q. If we exclude simple heterozygotes (which are
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2012) 76,159–167
165
A. Gialluisi et al.
not taken into account in the model), our approach can give
reliable estimates of disease allele frequencies for rare autosomal recessive disorders. In fact, a high HI will imply a less
biased estimate of q, and therefore a smaller confidence interval for q estimates (ten Kate et al., 2010). Such a high
HI could be observed in some specific disorders like Cystic
Fibrosis, especially in those countries where F508 (the main
pathogenic allele) reaches very high relative frequencies, entailing a high prevalence of homozygous individuals in the
population. More importantly, a high HI can occur in samples of affected children of consanguineous parents, therefore
making our model of great help in those countries where the
frequency of consanguineous marriages is high. In practical
terms, in every population or ethnic group, we can produce
a ranking order of the prevalence of autosomal recessive disorders. This will have social and clinical relevance and will
allow the establishment of priorities for genetic testing at the
population level.
Given the very low figures shown by disease allele frequency and disease prevalence in the general population, it is
impossible to estimate q with great precision. Highly accurate
estimates will become possible when all the sources of bias
will be under control. However in light of the potential applications of this approach, we are at present more interested
in building a ranking order of prevalence of autosomal recessive disorders rather than in estimating their prevalence in a
very precise way.
In conclusion, we propose to collect mutation data from
the offspring of consanguineous marriages (as well as of apparently unrelated parents) affected with autosomal recessive
disorders from different molecular diagnostic laboratories in
all the countries with high consanguinity rates. We will obtain in every population a ranking order according to the
different HIs whose values will be inversely proportional to
the frequency of the disorder. This approach will have the
advantage over traditional descriptive epidemiology studies of
generating an estimate of the relative frequency of the different autosomal recessive disorders, free of the bias due to
underdiagnosis. Moreover this approach will not need the
collection of very large samples or additional mutation data
from the general population of unaffected individuals. From a
decision-making point of view, this new combined approach
of molecular and genetic epidemiology based on consanguinity should become useful to establish priorities for genetic
screening and to assess the opportunity of widespread genetic
screening for certain autosomal recessive disorders with respect to others, especially in those countries where there is a
high frequency of consanguineous marriages.
Therefore, it will have potentially useful applications in epidemiological studies of autosomal recessive disorders. Indeed,
it will allow researchers to build, in each population, a ranking
order of prevalence of several autosomal recessive disorders,
166
Annals of Human Genetics (2012) 76,159–167
only relying on data already available (i.e., the genotype distribution and the mutational records of sample of patients,
along with their pedigree information). This means savings
of economical resources, which is a very important aspect in
planning genetic screening programs at a population level in
developing countries. As a consequence, it will be possible
to concentrate resources directly on the prevention of those
autosomal recessive disorders which are most frequent in a
given country.
Finally, the approach based on mutation analysis in offspring
of consanguineous parents can be integrated in the Locus Specific DataBases (LSDBs) which have been rapidly increasing
in number during the last decade (Romeo, 2010; van Baal
et al., 2010) and in new research projects and networks like the
one recently proposed by a group of medical geneticists from
different countries of the Mediterranean sea basin (Ozcelik
et al., 2010). It is therefore advisable that mutational records
report from now on the degree of relationship of parents (if
consanguineous) besides the molecular characterization of the
mutation present in each patient.
Acknowledgements
The authors wish to thank Prof. L.P. ten Kate for useful discussions, Prof. Dr. Ahmet Gul for sharing Turkish-FMF patient
genotype information and Dr. S. Presciuttini for technical advice
in the statistical analysis.
References
Berchovich, D., Elimelech, A., Yardeni, T., Korem, S., Gal, N.,
Goldstein N., Vilensky, B., Segev, R., Avraham, S., Loewenthal,
R., Schwarts, G. & Anikster, Y. (2008a) A mutation analysis of the
Phenylalanine Hydroxylase (PAH) gene in the Israeli population.
Ann Hum Genet 72, 305–309.
Berchovich, D., Elimelech, A., Zlotogora, J., Korem, S., Yardeni, T.,
Gal, N., Goldstein, N., Vilensky, B., Segev, R., Avraham, S.,
Loewenthal, R., Schwarts, G. & Anikster, Y. (2008b) Genotypephenotype correlations analysis of mutations in the phenylalanine
hydroxylase (PAH) gene. J Hum Genet 53, 407–418.
Carothers, A.D., Rudan, I., Kolcic, I., Hayward, C., Wright, A.F.,
Campbel, H., Teague, P., Hastie, N.D. & Weber, J.L. (2006) Estimating human inbreeding coefficients: Comparison of genealogical and marker heterozygosity approaches. Ann Hum Genet 70,
666–676.
Cavalli-Sforza, L.L., Moroni, A. & Zei, G. (2004) Consanguinity,
inbreeding and genetic drift in Italy. Princeton, NJ: Princeton University Press.
Dahlberg, G. (1948) Mathematic methods for population genetics. Basel:
S. Karger.
Dinc, A., Pay, S., Turan, M. & Simsek, I. (2000) Prevalence of
Familial Mediterranean Fever in young Turkish men. Clin Exp
Rheumatol 18, 292.
El-Shanti, H., Abdel, Majeed H. & El-Khateeb, M. (2006) Familial
Mediterranean fever in Arabs. Lancet 367, 1016–1024.
Leutenegger, A.L., Prum, B., Génin, E., Verny, C., Lemainque, A.,
Clerget-Darpoux, F. & Thomson, E.A. (2003) Estimation of the
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics The Homozygosity Index Method
inbreeding coefficient through the use of genomic data. Am J
Hum Genet 73, 516–523.
Li, C.C. (1955) Population Genetics. Chicago: University of Chicago
Press.
Mattit, H., Joma, M., Al-Cheikh, S., El-Khateeb, M., MedlejHashim, M., Salem, N., Delague, V. & Mégarbané, A. (2006)
Familial Mediterranean Fever in the Syrian population: Gene
mutation frequencies, carrier rates and phenotype-genotype
correlation. Eur J Hum Genet 49, 481–486.
McQuillan, R., Leutenegger, A.L., Abdel-Rahman, R., Franklin,
C.S., Pericic, M. Barac-Lauc, L., Smolej-Narancic, N., Janicijevic,
B., Polasek, O., Tenesa, A., MacLeod, A.K., Farrington, S.M.,
Rudan, P., Hayward, C., Vitart, V., Rudan, I., Wild, S.H.,
Dunlop, M.G., Wright, A.F., Campbell, H. & Wilson, J.F. (2008)
Runs of Homozygosity in European populations. Am J Hum Genet
83, 359–372.
Medlej-Hashim, M., Serre, J.L., Corbani, S., Saab, O., Delague, V.,
Chouery, E., Salem, N., Loiselet, J., Lefranc, G. & Mégarbané,
A. (2005) Familial Mediterranean fever (FMF) in Lebanon and
Jordan: A population genetics study and report of three novel
mutations. Eur J Med Genet 48, 412–420.
Moradian, M.M., Sarkisian, T., Ajrapetyan, H. & Avanesian, N.
(2010) Genotype–phenotype studies in a large cohort of Armenian patients with Familial Mediterranean Fever suggest clinical
disease with heterozygous MEFV mutations. J Hum Genet 55,
389–393.
Ozcelik, T., Kanaan, M., Avraham, K.B., Yannoukakos, D.,
Megarbane A., Tadmouri, G.O., Middleton, L., Romeo, G.,
King, M.C. & Levy-Lahad, E. (2010) Collaborative genomics
for human health and cooperation in the Mediterranean region.
Nat Genet 42, 641–645.
Ozen, S., Karaaslan, Y., Ozdemir, O., Saacti, U., Baakkaloglu, A.,
Koroglu, E. & Tezcan, S. (1998) Prevalence of juvenile chronic
arthritis and Familial Mediterranean Fever in Turkey: A field
study. J Rheumatol 25, 2445–2449.
Polašek, O., Hayward, C., Bellenguez, C., Vitart, V., Kolcic, I.,
McQuillan, R., Saftic, V., Gyllensten, U., Wilson, J.F., Rudan, I.,
Wright, A.F., Campbell, H. & Leutenegger, A.L. (2010) Comparative assessment of methods for estimating individual genomewide Homozygosity-by-descent from human genomic data. BMC
Genomics 11, 139–148.
Romeo, G., Menozzi, P., Ferlini, A., Fadda, S., Di Donato, S.,
Uziel, G., Lucci, B., Capodaglio, L., Filla, A. & Campanella, G.
(1983a) Incidence of Friedreich ataxia in Italy estimated from
consanguineous marriages. Am J Hum Genet 35, 523–529.
Romeo, G., Menozzi, P., Ferlini, A., Prosperi, L., Cerone, R.,
Scalisi, S., Romano, C., Antonozzi, I., Riva, E., Piceni Sereni,
L., Zammarchi, E., Lenzi, G., Sartorio, R., Andria, G., Cioni,
M., Fois, A., Burroni, M., Burlina, A.B. & Carnevale, F. (1983b)
Incidence of classic PKU in Italy estimated from consanguineous
marriages and from neonatal screening. Clin Genet 24, 339–345.
Romeo, G., Bianco, M., Devoto, M., Menozzi, P., Mastella,
G., Giunta, A.M., Micalizzi, C., Antonelli, M., Battistini, A.,
Santamaria, F., Castello, D., Marianelli, A., Marchi, A.G., Manca,
A. & Milano, A. (1985) Incidence in Italy, genetic heterogeneity,
and segregation analysis of cystic fibrosis. Am J Hum Genet 37,
338–349.
Romeo, G. (2010) LSDBs: Promise and Challanges. Hum Mutat 31,
V.
Santos, L.L., Fonseca, C.G., Januário, J.N., Aguiar, M.J.B., Peixoto,
M.G.C.D. & Carvalho, M.R.S. (2010) Variations in genotypephenotype correlations in Phenylketonuria patients. Genet Mol
Res 9, 1–8.
C
ten Kate, L.P., Teeuw, M., Henneman, L. & Cornel, M.C. (2010)
Autosomal recessive disease in children of consanguineous parents: Inferences from the proportion of compound heterozygous.
J Community Genet 1, 37–40.
Touitou, I. (2001) The spectrum of Familial Mediterranean Fever
(FMF) mutations. Eur J Hum Genet 9, 473–483.
van Baal, S., Zlotogora, J., Lagoumintzis, J., Gkantouna, V., Tzimas,
I., Poulas, K., Tsakalidis, A., Romeo, G. & Patrinos, G.P. (2010)
ETHNOS: A versatile electronic tool for the development and
curation of national genetic databases. Hum Genomics 4, 361–368.
Woods, C.G., Cox, J., Springell, K., Hampshire, D.J., Mohamed,
M.D., McKibbin, M., Stern, R., Raymond, F.L., Sandford, R.,
Malik Sharif, S., Karbani, G., Ahmed, M., Bond, J., Clayton,
D. & Inglehearn, C.F. (2006) Quantification of homozygosity in
consanguineous individuals with autosomal recessive disease. Am
J Hum Genet 78, 889–896.
Yilmaz, R., Ozer, S., Ozyurt, H., Erkorkmaz, U. & Sahin, S. (2009)
Familial Mediterranean fever gene mutations in the inner northern region of Turkey and genotype–phenotype correlation in children. J Paediatr Child Health 45, 641–645.
Zare-Karizi, Sh., Hosseini-Mazinani, S.M., Khazaei-Koohpar,
Z., Seifati, S.M., Shahsavan-Behboodi, B., Akbari, M.T. &
Koochmeshgi, J. (2010) Mutation spectrum of phenylketonuria
in Iranian population. Mol Genet Metab 102, 29–32.
Supporting Information
Additional Supporting Information may be found in the online version of this article:
Figure S1 Mutational spectra and ratios of Homozygotes
(HOM, red) versus Compound Heterozygotes (CH, green)
for each of the 6 sample examined (excluding alleles with
relative frequencies <1% and grouping alleles with relative
frequency between 1% and 2% in the “others” group, when
present).
Figure S2 qsample CI95% vs q1 plot, for a sample size of
n = 25, calculated from a simulated affected population with
(a) F = 0.0625 (i.e. patients born to first cousins parents) and
(b) F = 0.001 (i.e. patients born to unrelated parents).
Table S1 Weighted average F for (a) the Lebanese (b) the
Turkish (c) the Israeli Arab, and (d) the Israeli Jews samples
of patients born to unrelated individuals. Data taken from the
http://www.consang.net tables
Supplementary material S1 Sample size analysis.
Supplementary material S2 Pedigrees of consanguineous
patients with 1st cousin relationships between parents
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such
materials are peer-reviewed and may be re-organised for
online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information
(other than missing files) should be addressed to the authors.
Received: 20 April 2011
Accepted: 17 October 2011
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2012) 76,159–167
167