Statistical analysis of individual assignment tests among four cattle

Statistical analysis of individual assignment tests among four
cattle breeds using fifteen STR loci1
R. Ciampolini,* V. Cetica,* E. Ciani,* E. Mazzanti,* X. Fosella,‡ F. Marroni,‡ M. Biagetti,†
C. Sebastiani,† P. Papa,† G. Filippini,† D. Cianci,* and S. Presciuttini‡2
*Dipartimento di Produzioni Animali, Viale delle Piagge 2, 56100 Pisa, Italy;
†Istituto Zooprofilattico Sperimentale Umbria Marche (IZSUM), via Salvemini 1, 64126 Perugia, Italy;
and ‡Centro di Genetica Statistica, S.S. Abetone e Brennero 2, 56127 Pisa, Italy
ABSTRACT: Assignment tests based on multilocus
genotypes are becoming increasingly important to certify quality and origin of livestock products and assure
food safety and authenticity. The purpose of this study
was to determine the potential of microsatellites (STR)
for determining the breed origin of beef products among
cattle breeds present in the market. We typed 19 STR
in 269 animals from 4 cattle breeds. Based on Wright’s
F-statistics, 4 loci were discarded, and the remaining
15 loci (FIT = 0.101, FST = 0.089, and FIS = 0.013) were
used to compute the likelihood that each multilocus
genotype of the total sample was drawn from its true
breed instead of another breed. To avoid occurrence of
zero likelihood when one or more alleles were missing
from a tested breed, sample allele frequencies were esti-
mated assuming uniform prior distributions. Log-likelihood ratio [log(LR)] distributions of the individual assignments were determined for all possible breed contrasts, and their means and SD were used to infer the
true-positive and false-positive rates at several values
of the log(LR). The posterior probability that the animals of a presumed breed were actually drawn from
that breed instead of any another breed was then calculated. Given an observed value of log(LR) > 0 and assuming equal priors, these probabilities were >99.5%
in 10 of 12 possible breed contrasts. For the 2 most
closely related breeds (FST = 0.041), this probability was
96.3%, and the probability of excluding the origin of
an animal from an alleged breed when it was actually
derived from another breed was similar.
Key words: assignment test, cattle breed, genetic diversity, microsatellite loci, statistical power
2006 American Society of Animal Science. All rights reserved.
INTRODUCTION
J. Anim. Sci. 2006. 84:11–19
Rannala and Mountain, 1997; Pritchard et al., 2000);
for domesticated animals, breed assignment tests
based on multilocus genotypes have been applied to a
variety of species (Buchanan et al., 1994; Bjornstad
and Roed, 2001; Koskinen, 2003), including cattle
(Blott et al., 1999; Ciampolini et al., 2000; Maudet et
al., 2002). Nonetheless, there is still uncertainty about
which computational approach is preferable.
The objective of the current study was to assess the
practicality of assigning individuals among 4 cattle
breeds using STR. This goal was divided into 3 major
tasks: 1) validating the markers used in the assignment tests through analysis of genetic heterogeneity,
2) calculating the likelihood that each animal originated from its true breed as well as from any of the
others, and 3) performing a statistical analysis of the
assignment tests in terms of sensitivity and specificity.
Traceability of livestock individuals to their source
breed is becoming more and more important in both
domestic and global trade to assure food safety and
authenticity (McKean, 2001). In addition, breed names
are increasingly being used as brand names (Narrod
and Fuglie, 2000), and this has stimulated interest by
the beef industry to develop techniques to certify the
breed origin of beef products. Therefore, validating
tests for allocating individuals to known breeds is an
important task in applied population genetics.
Highly variable genetic markers (STR) may provide
sufficient information for tracing the population origin
of single individuals if there is sufficient genetic heterogeneity among populations (Paetkau et al., 1995;
1
MATERIALS AND METHODS
We gratefully acknowledge C. Brenner (http://dna-view.com/) for
suggesting the use of Laplace’s rule of succession for estimating allele
frequencies in the presence of missing alleles.
2
Corresponding author: [email protected]
Received May 19, 2005.
Accepted August 19, 2005.
Breeds and Animals
Chianina is a large-size, high-priced beef breed,
which originated in central Italy and is the source
11
12
Ciampolini et al.
of the renowned “Florentine steak.” Subjects for the
present work (n = 67) were sampled from several
farms, representative of the entire breeding area; all
individuals were recorded in the herd book. Charolais
and Limousin are beef breeds of French origin, often
imported in Italy by fattening farms; taken together,
they share an important part of the Italian beef market. Subjects for the present work (n = 69 and 67,
respectively) were imported from France by different
farms at 8 to 10 mo of age. The Italian Friesian (Frisona) is the main dairy breed reared in Italy, but it
also is a relevant source of meat (male calves and endcareer subjects). Frisona subjects (n = 66) were sampled in a single farm, based on its herd book. This
sampling scheme assured that the kinship coefficient
of all possible pairs of animals in each breed was lower
than 12.5%. Blood was collected within farms or at
slaughter.
Microsatellite Analysis
Samples of DNA were extracted following Jeanpierre (1987). Of the 19 STR typed, 7 were included
in the panel of 30 markers agreed at the 1996 meeting
of the International Society for Animal Genetics in
Tours, France (http://www.projects.roslin.ac.uk/cdiv/
markers.html), and the others were selected based on
our previous experience (Ciampolini et al., 1995, 2000).
Primer sequences were obtained from Vaiman et al.
(1994; INRA-loci) and Steffen et al. (1993; ETH-loci).
Optimum PCR amplification conditions were determined for each marker separately. (They are available
on request.) Amplified fragments were separated by
capillary electrophoresis using an ABI PRISM 310 automated sequencer (Applied Biosystems, Foster City,
CA) following manufacturer’s protocols. Data were analyzed using GENESCAN 2.0 and GENOTYPER 2.0.
Genetic Structure of the Population Sample
In preliminary genetic analysis, allele frequencies
at the typed loci were estimated by direct count, and
observed (H) and expected (E{H}) locus heterozygosities were determined as the proportion of total heterozygous individuals and as 1 − Σpi2, respectively. The
degree of genetic variation within and among breeds
was measured by Wright’s F-statistics. The homozygote excess in the total sample (FIT) was computed as
FIT = 1 − HT/E{HT}, where HT was the heterozygosity
of the total sample (HT is equivalent to the weighted
mean of the observed heterozygosities computed on
individual samples). Global and pairwise FST were calculated by ARLEQUIN (Schneider et al., 2000) as the
ratio of the variance of gene frequencies among (or
between) subsamples to the total variance, and FIS
was calculated as FIS = (FIT − FST)/(1 − FST). We also
calculated the excess of homozygotes in each breed [a
quantity denoted fITi, the small letter f indicating a
lower level in the hierarchical analysis (Presciuttini
et al., 1990)], as fITi = 1 − Hi/E{Hi}, where Hi was the
heterozygosity of breed i. The mean value of the 4 fITi
is theoretically equivalent to FIS.
Linkage disequilibrium (or gametic imbalance for
unlinked loci) was evaluated between all pairs of loci,
separately for the 4 breeds, using ARLEQUIN.
Genotype Likelihoods
The log-likelihood of a multilocus genotype, conditional on the individual originating from a particular
breed, was computed as the sum, over all loci, of the
logarithm of probability of each locus genotype in that
breed (Paetkau et al., 1995). In this approach, a problem arises when 1 of the 2 alleles, or both, are absent
from a putative source sample because the calculation
produces a multilocus zero likelihood. Several solutions have been proposed, from adding the test genotype to all samples (Paetkau et al., 1995), to assigning
arbitrarily low values to the missing alleles (Paetkau
et al., 1998; Davies et al., 1999; Paetkau et al., 2004),
to replacing them with the inverse number of gene
copies in each sample or a related value (Waser and
Strobeck, 1998; Guinand et al., 2002), to calculating
genotype probabilities as the sampling of 2 alleles from
a compound multinomial-Dirichlet distribution (Rannala and Mountain, 1997). We first calculated the genotype log-likelihoods using ARLEQUIN; however,
the method adopted by this program to take into account the missing alleles is not explicit, and we recalculated the log-likelihoods using the following estimates of the allele frequencies: pi = (fi + 1)/(ni + a),
where fi is the number of copies of an allele observed
in breed i, ni is the number of gene copies for that
locus in that breed (equal to twice its sample size),
and a is the number of alleles at that locus observed
in the total sample of the 4 breeds. This is an application of Laplace’s rule of succession (Edwards, 1992)
and corresponds to the posterior estimation of the allele frequencies when a uniform prior is assumed; it
produces estimates of allele frequencies similar to the
method of Rannala and Mountain (1997), but it is more
conservative. Calculations were performed in an Excel
spreadsheet (Microsoft, Seattle, WA) after checking
that our algorithm produced the same results as ARLEQUIN when no missing alleles were present in
any sample.
Assignment Tests
The log-likelihoods of the observed genotypes (4 values for each individual, 1 computed for its true breed
and 3 computed for the other breeds) were used to
obtain the distributions of the logarithm of the likelihood ratio [log(LR)] of true vs. false assignments for
all possible breed pairs. Means and SD of the observed
log(LR) distributions were used to obtain the proportions of false positives (α) and true positives (1 − β) at
13
Assigning individuals to cattle breeds
Table 1. Wright’s F-statistics computed for 4 cattle breeds
Charolais
Locus
4
D1S6
D3S9
D4S11
D4S26
D5S1
D7S6
D9S1
D11S9
D12S4
D15S5
D16S10
D16S114
D17S64
D18S5
D21S4
D21S124
D23S15
D27S20
D27S16
Averages
Chianina
Frisona
Limousin
Name
Na1
E{H}2
FIT
FST
FIS
fIT
Pa3
fIT
Pa
fIT
Pa
fIT
Pa
INRA011
INRA006
INRA072
INRA037
ETH152
INRA053
ETH225
INRA032
INRA005
INRA050
INRA013
INRA035
INRA025
INRA063
ETH131
INRA031
INRA064
INRA016
INRA027
14
8
10
11
9
7
9
10
3
15
11
6
9
6
12
15
6
10
6
9.3
0.615
0.460
0.706
0.704
0.743
0.718
0.807
0.704
0.607
0.791
0.745
0.400
0.786
0.633
0.827
0.780
0.753
0.810
0.764
0.703
0.250
0.112
0.105
0.076
0.135
0.193
0.120
0.034
0.075
0.149
0.071
0.461
0.376
0.008
0.048
0.247
0.220
0.018
0.154
0.150
0.128
0.108
0.152
0.051
0.055
0.170
0.094
0.057
0.032
0.076
0.104
0.033
0.121
0.028
0.068
0.093
0.182
0.065
0.095
0.090
0.140
0.005
−0.054
0.027
0.084
0.027
0.028
−0.024
0.045
0.079
−0.036
0.442
0.289
−0.021
−0.022
0.170
0.046
−0.051
0.065
0.065
0.069
0.052
−0.002
0.143*
0.106
0.097
0.078
0.078
0.103
0.082
0.020
0.457**
0.498***
−0.058
−0.006
0.055
0.095
0.091
0.185*
0.113
1
0
1
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0.3
−0.050
0.006
−0.030
−0.025
0.016
0.217*
−0.071
0.013
−0.065
0.134
0.164
0.519***
0.105
−0.114
0.041
0.291***
0.099
−0.018
0.038
0.067
3
2
1
0
2
2
2
1
0
1
1
0
0
0
2
3
0
0
0
1.1
0.121
0.070
−0.099
0.057
0.069
−0.061
0.011
−0.131
0.132
0.069
−0.073
0.126
0.402***
0.128
−0.062
0.176
−0.061
−0.195**
0.051
0.039
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
1
0
0.2
0.447***
0.057
0.039
−0.027
0.218*
0.002
0.189*
−0.012
0.068
0.110
−0.112
0.625***
0.285***
−0.001
0.016
0.202**
0.176*
−0.017
0.077
0.123
2
1
2
1
1
0
0
2
0
1
1
0
1
0
1
3
0
0
0
0.8
1
Total number of observed alleles.
Expected heterozygosity.
3
Number of alleles observed in that breed only.
4
Loci in bold italics have not been used in the assignment tests.
*P < 0.05 (homozygosity test); **P < 0.01 (homozygosity test); ***P < 0.001 (homozygosity test).
2
several test values of the log(LR). Normality of log(LR)
distributions were assessed by the Kolmogorov-Smirnov test (Sokal and Rohlf, 1995). The posterior probability Pr(breedJ) that an individual originated from
breed J, given the alternative hypothesis that it originated from breed K, and given the ratio γ = (1 − β)/α,
was calculated as Pr(breedJ) = γ/(γ + 1), which assumes
equal priors (Weir, 1996).
Simulated Animals
Simulations of 10,000 multilocus genotypes for each
breed were performed assuming Hardy-Weinberg
equilibrium and locus independence, using the estimated allele frequencies. Assignment tests and
log(LR) distributions were calculated for the simulated
animals in the same way as for the true animals.
RESULTS
Genetic Variation Among and Within Breeds
A total of 269 animals, evenly distributed among the
4 breeds (Charolais = 69, Chianina = 67, Frisona = 66,
and Limousin = 67), was typed for 19 microsatellite
loci (Table 1). Observed number of alleles per locus in
the total sample was between 3 (INRA005) and 15
(INRA031 and INRA050), and the expected heterozygosity (E{H}) was between 0.400 (INRA035) and 0.827
(ETH131). A remarkable proportion of all alleles (44
of 177; 25%) was found in a single breed only; among
these, 20 alleles were private to Chianina (2 with fre-
quencies >10%), 16 to Limousin, 5 to Charolais, and
3 to Frisona. Analysis using Wright’s F-statistics revealed a high level of homozygote excess in the total
sample (FIT = 15%, averaged over loci); this was in
large part due to variation of gene frequency among
breeds (FST = 9%) and to a lesser extent to homozygote
excess within breeds (FIS = 6.5%). Analysis of homozygote excess performed at the level of each breed (this
index is called fIT to distinguish it from the analogous
parameter computed for the total sample; see Table
1) revealed 2 loci (INRA035 and INRA025) with highly
significant homozygote excess in three breeds each.
This excess was very high on an absolute scale (e.g.,
62.5% in Limousin for INRA035 and 49.8% in Charolais for INRA025), suggesting that it was due to experimental artifacts rather than to real genetic effects.
Two other loci (INRA011 and INRA031) also displayed
FIS values >10% because of large homozygote excess
in some breeds; therefore, these 4 loci were excluded
from further analyses. After this modification, the new
mean values of Wright’s F-statistics were as follows:
FIT = 0.101, FST = 0.089, and FIS = 0.013.
As two pairs of loci were located on the same chromosome (INRA037/INRA072 and INRA016/INRA027, respectively), we tested all possible pairs of loci for linkage disequilibrium within each breed. A total of 420
pairwise tests were carried out (105 in each breed),
and 57 of them (13.6%) showed P-values < 0.05; this
proportion is significantly greater than expected by
chance (expected number = 21.0; χ2 = 64.9; 1 df). The
2 syntenic locus pairs contributed only 2 significant
14
Ciampolini et al.
Assignment Tests
Figure 1. Unrooted dendogram of the 4 breeds based on
the FST pairwise distance matrix calculated on 15 highly
variable genetic markers (STR) loci.
contrasts of 57, thereby revealing the presence of
highly significant gametic imbalance in all breeds.
Genetic distance between breeds measured by pairwise FST (6 total comparisons) was calculated using
the 15 validated loci. All 6 contrasts were significant
at P < 0.001. Frisona (the dairy breed) was the most
differentiated (FST = 0.134, 0.102, and 0.101 with Chianina, Charolais, and Limousin, respectively), whereas
Limousin and Charolais were less differentiated (FST =
0.041). Figure 1 shows the Unweighted Pair Group
Method with Arithmetic Mean (UPGMA) tree corresponding to the FST distance matrix.
Genotype Likelihoods
We first applied the likelihood calculation implemented in the ARLEQUIN package and compared the
results with those obtained by our posterior estimation
of allele frequencies. As an example, Figure 2 shows
the results of Chianina vs. Limousin. Two log(LR) distributions are shown: the assignments of true Chianina animals to their breed (rather than to Limousin)
and the false assignments of Limousin animals to Chianina, respectively. The 2 distributions are more separated from each other using ARLEQUIN, showing that
our method of correcting for the missing alleles is more
conservative. Therefore, we applied it to all subsequent analyses, using the set of 15 loci validated in
the previous section. Figure 3 shows the plots of the
log-likelihoods of all animals of each breed calculated
for their true breed (dots) in comparison with the loglikelihoods that they are assigned to the other 3
breeds. It is evident that Frisona is the most discriminated from all other breeds, whereas Limousin and
Charolais are the least discriminated; for example, the
likelihood of 2 Limousin animals being Charolais is
higher than that of being Limousin, and the likelihoods
are very close for 2 other animals.
To estimate the statistical power of the assignment
tests at greater resolution than was possible given
the available sample sizes, we initially opted for a
simulation approach (Roques et al., 1999). For each
breed, we generated 10,000 multilocus genotypes and
calculated the log(LR) of breed assignment; however,
the average log(LR) obtained in the simulations was
lower than that of the real animals by about 3 log units
for all true assignments and by approximately 2 log
units for all false assignments (data not shown). We
then abandoned this approach and performed the statistical analysis by recourse to the theory of the normal
distribution, using the mean and SD of the log(LR)
calculated from the samples. Deviations from normality were subjected to a preliminary Kolmogorov-Smirnov test; none of the 12 tests (3 pairwise contrasts for
each breed) was significant. Figure 4 shows, as an
example, the log(LR) distributions obtained contrasting true Limousin animals against the Charolais
falsely attributed to Limousin; corresponding normal
distributions are superimposed. Limousin and Charolais have been chosen for display because these are
the least discriminated breeds. In fact, the log(LR)
of some Limousin animals is <0, meaning that their
genotype would be more likely among the Charolais
than among the Limousin. In the settings of Figure 4,
they are to be considered as false negatives. (They
become false positives when the presumed breed is
Charolais and the tested breed is Limousin.)
Table 2 summarizes the results of the present work.
The proportions of false positives (α) and true positives
(1 − β) are shown for all possible breed contrasts, including both the situations in which the presumed
breed of an animal (e.g., Chianina) is tested against
an alternative hypothesis (e.g., Limousin) and the opposite situation (the assumed breed is Limousin and
the tested breed is Chianina). The column “Probability” shows the value of the posterior probability that
any animal with log(LR) greater than the test value
is derived from the presumed breed, assuming equal
priors of the 2 breeds. Three test values of the log(LR)
are chosen for display (0, 1, and 2), corresponding to
increasing significance levels (α) of the assignments. It
is evident from these data that discriminating between
Limousin and Charolais is the most difficult situation,
particularly when the presumed breed is Charolais.
In this case, even a log(LR) >2 produces a posterior
probability of assignment <98.5%. In all other contrasts, a probability >99.5% is attained at the test
value log(LR) >0 (except the Frisona vs. Limousin
case). At the test value log(LR) >2, the probability of
assignment is >99.9% in 11 of 12 contrasts, and the
power of the test (1 − β) is >99% in 9 of 12 contrasts.
DISCUSSION
In the current study, we determined the potential
of assigning individuals among cattle breeds using
Assigning individuals to cattle breeds
15
Figure 2. Results of the assignment tests by ARLEQUIN (open circles) and by our calculation (filled circles). The
distributions of the log-likelihood ratio [log(LR)] values that Chianina animals are correctly assigned to the Chianina
breed (right curves) are compared with the distributions of Limousin animals falsely classified as Chianina (left).
Points represent the moving averages of 3 log(LR) intervals, each of 5 log units.
STR. We adopted an apparently straightforward approach, consisting of the following steps: 1) use all
available genetic information (19 typed loci) as the
basis of individual assignment, 2) use a popular software package to calculate the likelihoods of the
multilocus genotypes, 3) use computer simulations to
Figure 3. Individual log-likelihood of breed assignments. Charts show the 4 likelihood values calculated for each
animal of each breed, including the likelihood that they are drawn from their true breed (dotted line, ordered by
increasing values) and the likelihood that they are drawn from the other 3 breeds (unbroken and dashed lines).
16
Ciampolini et al.
Figure 4. The log-likelihood ratio [log(LR)] distributions of the true Limousin vs. Charolais contrast. Bars = true
animals; curves = normal distributions with mean and standard deviations estimated from true animals.
Table 2. Power analysis of breed allocation
Test value
of log(LR)1
True
positives
(1 − β)
False
positives
(α)
Probability2
True
positives
(1 − β)
Chianina vs. Limousin
0
1
2
0.9981
0.9968
0.9946
0
1
2
0.9987
0.9975
0.9956
0
1
2
1.0000
0.9999
0.9998
0
1
2
0.9616
0.9444
0.9217
0
1
2
0.9920
0.9890
0.9849
0
1
2
0.9985
0.9975
0.9960
0.0024
0.0015
0.0008
0.9976
0.9985
0.9991
0.9976
0.9960
0.9936
0.9989
0.9995
0.9997
0.9989
0.9978
0.9959
1.0000
1.0000
1.0000
1.0000
1.0000
0.9999
0.9949
0.9980
0.9993
0.9950
0.9882
0.9744
1.0000
1.0000
1.0000
1.0000
1.0000
0.9999
0.9999
1.0000
1.0000
0.9999
0.9998
0.9995
1
LR = likelihood ratio.
Computed as (1 − β)/(1 − β + α).
2
0.9987
0.9993
0.9996
0.0000
0.0000
0.0000
1.0000
1.0000
1.0000
0.0384
0.0258
0.0169
0.9629
0.9746
0.9830
Frisona vs. Limousin
Charolais vs. Frisona
0.0001
0.0000
0.0000
0.0013
0.0007
0.0004
Charolais vs. Limousin
Limousin vs. Frisona
0.0000
0.0000
0.0000
0.9981
0.9989
0.9994
Frisona vs. Chianina
Limousin vs. Charolais
0.0050
0.0019
0.0007
0.0019
0.0011
0.0006
Charolais vs. Chianina
Chianina vs. Frisona
0.0000
0.0000
0.0000
Probability
Limousin vs. Chianina
Chianina vs. Charolais
0.0011
0.0005
0.0003
False
positives
(α)
0.0080
0.0057
0.0040
0.9921
0.9944
0.9960
Frisona vs. Charolais
0.0015
0.0009
0.0005
0.9985
0.9991
0.9995
Assigning individuals to cattle breeds
generate large samples of animals to obtain accurate
significance levels of the assignment tests. Experience
gained during this work showed that none of these
steps could be applied without caveats.
Analysis of genetic variation among and within
breeds highlighted 4 loci with exceedingly high FIS
values. Inspection of observed and expected heterozygosity at the individual breed level (fIT) showed that
this effect was due to a large excess of homozygous
genotypes in all 4 breeds for the loci INRA035 and
INRA025. The INRA035 showed a large excess of homozygous individuals in 2 other reports (Jordana et
al., 2003; Casellas et al., 2004) with FIS = 0.384 and
0.643, respectively. A likely explanation is that some
heterozygous individuals are classified as homozygotes because of failed amplification of one allele (Jordana et al., 2003; SanCristobal et al., 2003). These
results highlight the importance of reporting locus-bylocus F-statistics in the genetic analyses of cattle, to
identify the loci that may be more likely subjected to
experimental artifacts. After removal of the problematic loci, a residual small excess of homozygotes (FIS =
1.3%) remained, but this is expected in samples composed of individuals coming from several different
rearing farms and could be due to within-breed Wahlund’s effect (i.e., to population stratification at the
breed level). Considering the index of genetic differentiation among breeds measured by FST (0.09), our estimate was similar to that obtained among other studies
performed in cattle (Schmid et al., 1999; Maudet et
al., 2002).
The likelihood calculation performed by a popular
program has proven to be another critical issue. As
many as 25% of the alleles observed in the total sample
were present in one breed only, so that the occurrence
of genotypes with zero likelihood would be quite frequent without correction for the missing alleles. This
question must be considered carefully in the validation
of assignment tests in cattle, as any given choice may
have important consequences on the distributions of
the log(LR), and practical applications of assignment
tests may ultimately imply charges of fraud (Primmer
et al., 2000; Manel et al., 2002); a conservative approach is therefore advisable. We adopted the solution
of estimating allele frequencies as posterior probabilities, given the actual counts in a sample, and assuming
a uniform prior distribution. This method has the advantage of providing a correction that reduces the importance of all rare alleles in the assignment tests, not
only of the alleles that are missing in a sample. For
example, if an allele counts 1 in a subsample and 2 in
another of the same size, the odds of their frequencies
are 1:2 using the usual frequency calculation, whereas
the odds become 2:3 using this method.
The third step of our original design was to obtain
accurate estimates of false-positive and true-positive
ratios using large simulated samples of the 4 breeds.
However, the log(LR) distributions obtained for the
simulated animals were significantly different from
17
those calculated for the real animals, thus questioning
the validity of this approach. A possible explanation of
this discrepancy is the large level of gametic imbalance
detected within all 4 breeds, in accordance with a genome-wide study in cattle (Farnir et al., 2000). This
hypothesis could be verified only by taking into account
the allelic association in generating the simulated animals; we will consider this aspect in future extensions
of the present work. In any case, our results suggest
that computer simulations of multilocus genotypes
that assume Hardy-Weinberg and linkage equilibrium
should be used with caution.
The final estimated capability of the 15 validated
STR to allocate individuals from the 4 chosen breeds
to their correct source is remarkable. A recent work
on 3 Italian cattle breeds different from those investigated here (Moioli et al., 2004) reported a much lower
power of individual assignments. The discrepancy may
in part be explained by the lower genetic differentiation measured between these breeds (FST = 0.06); however, we suggest that the main reason is that those
researchers have explored the potential of assigning
individuals to a breed using a program that infers a
population structure of a sample based on the pooled
genotypic data (and then calculates the probability
that each animal belongs to each inferred group). Alternatively, we have used the known information on
the breed origin of each individual. The first approach
has been developed to deal with wild species, in which
the population origin of any given animal is generally
not apparent; in contrast, the phenotype differences
among domesticated animals generally allow for unequivocal breed identification, meaning that the statistical properties of any assignment test can effectively
be assessed by determining sensitivity and specificity
in samples of known source.
How can an assignment test be implemented in practice? Let us suppose that we want to verify that a
meat-cut comes from a Chianina animal, as it might
be stated in the label. This is far from an implausible
scenario, as the market price of Chianina is almost
twice that of other breeds present on the Italian beef
market. After the just-mentioned 15 loci are typed,
the log(LR) that this sample is from a Chianina rather
than from any of the other 3 breeds is calculated using
the allele frequencies estimated in the present work.
Then, we see from Table 2 that a value of log(LR) >0
has probability >99.8% if the cut is truly from Chianina
rather than, for example, from Limousin, whereas it
has probability <0.2% if it is from Limousin. Assuming
equal priors for those 2 breeds, we see that the posterior probability that a sample with log(LR) >0 is truly
from Chianina is >99.76%. The corresponding probabilities of assignments to Chianina when the contrasting hypotheses are Charolais and Frisona are
99.9 and 99.999%, respectively. If we assume more
stringent assignment criteria (e.g., log(LR) >1 or >2),
the posterior probabilities that the meat-cut is from
Chianina become overwhelming. Suppose, conversely,
18
Ciampolini et al.
that a meat-cut labeled as Chianina comes from a
breed other than Chianina. Then, the probabilities of
obtaining a log(LR) >0 for the Chianina are 0.24, 0.11,
and <0.01% for samples from Limousin, Charolais, and
Frisona, respectively; therefore, a log(LR) <0 has a
very high probability of being obtained, and the posterior probabilities that the meat-cut is from one of the
other breeds are 99.81, 99.87, and >99.99%, respectively. This evidence is certainly compelling.
The present analysis strictly applies to the 4 considered breeds; however, the power of the assignment
tests depends closely on the level of genetic heterogeneity existing between breeds, which is conveniently
measured by FST. Thus, this index can be used to anticipate the potential of allocating individuals among cattle breeds (Bjornstad and Roed, 2002). An important
area of future research will be the optimization of the
cost:benefit of the assignment test. For example, our
work shows that a given predetermined level of statistical significance (e.g., α < 1%) can be obtained with a
number of STR <15 for a substantial fraction of the
animals. This means that a sequential test based on
the consecutive typing of several sets of markers (say,
5 markers per set) could decrease the number of total
typed genotypes substantially, while keeping constant
the value of α. This procedure could eventually produce
a cost-effective, routinely implemented test. Another
area of future development will be to implement a test
that can answer the question “what is the probability
that this animal is actually from this breed?” in a
single step, thereby avoiding the need of performing
multiple pairwise tests. This problem could be addressed by collecting the allele frequencies of all possible alternative breeds and by forming a pooled database of the mean allele frequencies, weighted by the
population size of each contributing breed.
LITERATURE CITED
Bjornstad, G., and K. H. Roed. 2001. Breed demarcation and potential for breed allocation of horses assessed by microsatellite
markers. Anim. Genet. 32:59–65.
Bjornstad, G., and K. H. Roed. 2002. Evaluation of factors affecting
individual assignment precision using microsatellite data from
horse breeds and simulated breed crosses. Anim. Genet.
33:264–270.
Blott, S. C., J. L. Williams, and C. S. Haley. 1999. Discriminating
among cattle breeds using genetic markers. Heredity
82:613–619.
Buchanan, F. C., L. J. Adams, R. P. Littlejohn, J. F. Maddox, and
A. M. Crawford. 1994. Determination of evolutionary relationships among sheep breeds using microsatellites. Genomics
22:397–403.
Casellas, J., N. Jimenez, M. Fina, J. Tarres, A. Sanchez, and J.
Piedrafita. 2004. Genetic diversity measures of the bovine Alberes breed using microsatellites: Variability among herds and
types of coat colour. J. Anim. Breed. Genet. 121:101–110.
Ciampolini, R., H. Leveziel, E. Mazzanti, C. Grohs, and D. Cianci.
2000. Genomic identification of the breed of an individual or
its tissue. Meat Sci. 54:35–40.
Ciampolini, R., K. Moazami-Goudarzi, D. Vaiman, C. Dillmann,
E. Mazzanti, J. L. Foulley, H. Leveziel, and D. Cianci. 1995.
Individual multilocus genotypes using microsatellite polymor-
phisms to permit the analysis of the genetic variability within
and between Italian beef cattle breeds. J. Anim. Sci.
73:3259–3268.
Davies, N., F. X. Villablanca, and G. K. Roderick. 1999. Determining
the source of individuals: Multilocus genotyping in nonequilibrium population genetics. Trends Ecol. Evol. 14:17–21.
Edwards, A. W. F. 1992. Likelihood—Expanded Edition. The Johns
Hopkins Univ. Press, Baltimore, MD.
Farnir, F., W. Coppieters, J. J. Arranz, P. Berzi, N. Cambisano, B.
Grisart, L. Karim, F. Marcq, L. Moreau, M. Mni, C. Nezer, P.
Simon, P. Vanmanshoven, D. Wagenaar, and M. Georges. 2000.
Extensive genome-wide linkage disequilibrium in cattle. Genome Res. 10:220–227.
Guinand, B., A. Topchy, K. S. Page, M. K. Burnham-Curtis, W. F.
Punch, and K. T. Scribner. 2002. Comparisons of likelihood
and machine learning methods of individual classification. J.
Hered. 93:260–269.
Jeanpierre, M. 1987. A rapid method for the purification of DNA
from blood. Nucleic Acids Res. 15:9611.
Jordana, J., P. Alexandrino, A. Beja-Pereira, I. Bessa, J. Canon,
Y. Carretero, S. Dunner, D. Laloe, K. Moazami-Goudarzi, A.
Sanchez, and N. Ferrand. 2003. Genetic structure of eighteen
local south european beef cattle breeds by comparative F-statistics analysis. J. Anim. Breed. Genet. 120:73–87.
Koskinen, M. T. 2003. Individual assignment using microsatellite
DNA reveals unambiguous breed identification in the domestic
dog. Anim. Genet. 34:297–301.
Manel, S., P. Berthier, and G. Lukart. 2002. Detecting wildlife
poaching identifying the origin of individuals with bayesian
assignment test and multilocus genotypes. Conserv. Biol.
16:650–659.
Maudet, C., G. Luikart, and P. Taberlet. 2002. Genetic diversity
and assignment tests among seven French cattle breeds based
on microsatellite DNA analysis. J. Anim. Sci. 80:942–950.
McKean, J. D. 2001. The importance of traceability for public health
and consumer protection. Rev. Sci. Tech. 20:363–371.
Moioli, B., F. Napolitano, and G. Catillo. 2004. Genetic diversity
between Piedmontese, Maremmana, and Podolica cattle
breeds. J. Hered. 95:250–256.
Narrod, C. A., and K. O. Fuglie. 2000. Private-sector investment in
livestock breeding. Agribusiness 4:457–470.
Paetkau, D., W. Calvert, I. Stirling, and C. Strobeck. 1995. Microsatellite analysis of population structure in Canadian polar
bears. Mol. Ecol. 4:347–354.
Paetkau, D., G. F. Shields, and C. Strobeck. 1998. Gene flow between
insular, coastal and interior populations of brown bears in
Alaska. Mol. Ecol. 7:1283–1292.
Paetkau, D., R. Slade, M. Burden, and A. Estoup. 2004. Genetic
assignment methods for the direct, real-time estimation of migration rate: A simulation-based exploration of accuracy and
power. Mol. Ecol. 13:55–65.
Presciuttini, S., G. Paoli, and M. Franceschi. 1990. Surname versus
gene structure of a small isolate. Ann. Hum. Genet. 54:79–90.
Primmer, C. R., M. T. Koskinen, and J. Piironen. 2000. The one
that did not get away: Individual assignment using microsatellite data detects a case of fishing competition fraud. Proc. R.
Soc. Lond. B Biol. Sci. 267:1699–1704.
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of
population structure using multilocus genotype data. Genetics
155:945–959.
Rannala, B., and J. L. Mountain. 1997. Detecting immigration by
using multilocus genotypes. Proc. Natl. Acad. Sci. USA
94:9197–9201.
Roques, S., P. Duchesne, and L. Bernatchez. 1999. Potential of microsatellites for individual assignment: The North Atlantic redfish
(genus Sebastes) species complex as a case study. Mol. Ecol.
8:1703–1717.
SanCristobal, M., C. Chevalet, J. L. Foulley, and L. Ollivier. 2003.
Some methods for analysing genetic marker data in biodiversity
setting—example of the pigbiodiv data. Archivos de Zootecnia
52:173–183.
Assigning individuals to cattle breeds
Schmid, M., N. Saitbekova, C. Gaillard, and G. Dolf. 1999. Genetic
diversity in Swiss cattle breeds. J. Anim. Breed. Genet. 116:1–8.
Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin ver 2.000:
A software for population genetics data analysis. Genetics and
Biometry Laboratory, University of Geneva, Switzerland.
Sokal, R. R., and F. J. Rohlf. 1995. Biometry. 3rd ed. W. H. Freeman
and Co., New York, NY.
Steffer, P., A. Eggen, A. B. Dietz, J. E. Womack, G. Stranzinger,
and R. Fries. 1993. Isolation and mapping of polymorphic microsatellites in cattle. Anim. Genet. 24:121–124.
19
Vaiman, D., D. Mercier, K. Goudarzi, A. Eggen, R. Ciampolini, A.
Lepingle, R. Velmala, J. Kaukinen, S. L. Varvio, P. Martin, H.
Leveziel, and G. Guerin. 1994. A set of 99 cattle microsatellites:
Characterization, synteny mapping, and polymorphism. Mammal. Genome 5:288–297.
Waser, P. M., and C. Strobeck. 1998. Genetic signature of interpopulation dispersal. Trends Ecol. Evol. 13:43–44.
Weir, B. S. 1996. Genetic Data Analysis II. Sinauer, Sunderland,
MA.