Statistical analysis of individual assignment tests among four cattle breeds using fifteen STR loci1 R. Ciampolini,* V. Cetica,* E. Ciani,* E. Mazzanti,* X. Fosella,‡ F. Marroni,‡ M. Biagetti,† C. Sebastiani,† P. Papa,† G. Filippini,† D. Cianci,* and S. Presciuttini‡2 *Dipartimento di Produzioni Animali, Viale delle Piagge 2, 56100 Pisa, Italy; †Istituto Zooprofilattico Sperimentale Umbria Marche (IZSUM), via Salvemini 1, 64126 Perugia, Italy; and ‡Centro di Genetica Statistica, S.S. Abetone e Brennero 2, 56127 Pisa, Italy ABSTRACT: Assignment tests based on multilocus genotypes are becoming increasingly important to certify quality and origin of livestock products and assure food safety and authenticity. The purpose of this study was to determine the potential of microsatellites (STR) for determining the breed origin of beef products among cattle breeds present in the market. We typed 19 STR in 269 animals from 4 cattle breeds. Based on Wright’s F-statistics, 4 loci were discarded, and the remaining 15 loci (FIT = 0.101, FST = 0.089, and FIS = 0.013) were used to compute the likelihood that each multilocus genotype of the total sample was drawn from its true breed instead of another breed. To avoid occurrence of zero likelihood when one or more alleles were missing from a tested breed, sample allele frequencies were esti- mated assuming uniform prior distributions. Log-likelihood ratio [log(LR)] distributions of the individual assignments were determined for all possible breed contrasts, and their means and SD were used to infer the true-positive and false-positive rates at several values of the log(LR). The posterior probability that the animals of a presumed breed were actually drawn from that breed instead of any another breed was then calculated. Given an observed value of log(LR) > 0 and assuming equal priors, these probabilities were >99.5% in 10 of 12 possible breed contrasts. For the 2 most closely related breeds (FST = 0.041), this probability was 96.3%, and the probability of excluding the origin of an animal from an alleged breed when it was actually derived from another breed was similar. Key words: assignment test, cattle breed, genetic diversity, microsatellite loci, statistical power 2006 American Society of Animal Science. All rights reserved. INTRODUCTION J. Anim. Sci. 2006. 84:11–19 Rannala and Mountain, 1997; Pritchard et al., 2000); for domesticated animals, breed assignment tests based on multilocus genotypes have been applied to a variety of species (Buchanan et al., 1994; Bjornstad and Roed, 2001; Koskinen, 2003), including cattle (Blott et al., 1999; Ciampolini et al., 2000; Maudet et al., 2002). Nonetheless, there is still uncertainty about which computational approach is preferable. The objective of the current study was to assess the practicality of assigning individuals among 4 cattle breeds using STR. This goal was divided into 3 major tasks: 1) validating the markers used in the assignment tests through analysis of genetic heterogeneity, 2) calculating the likelihood that each animal originated from its true breed as well as from any of the others, and 3) performing a statistical analysis of the assignment tests in terms of sensitivity and specificity. Traceability of livestock individuals to their source breed is becoming more and more important in both domestic and global trade to assure food safety and authenticity (McKean, 2001). In addition, breed names are increasingly being used as brand names (Narrod and Fuglie, 2000), and this has stimulated interest by the beef industry to develop techniques to certify the breed origin of beef products. Therefore, validating tests for allocating individuals to known breeds is an important task in applied population genetics. Highly variable genetic markers (STR) may provide sufficient information for tracing the population origin of single individuals if there is sufficient genetic heterogeneity among populations (Paetkau et al., 1995; 1 MATERIALS AND METHODS We gratefully acknowledge C. Brenner (http://dna-view.com/) for suggesting the use of Laplace’s rule of succession for estimating allele frequencies in the presence of missing alleles. 2 Corresponding author: [email protected] Received May 19, 2005. Accepted August 19, 2005. Breeds and Animals Chianina is a large-size, high-priced beef breed, which originated in central Italy and is the source 11 12 Ciampolini et al. of the renowned “Florentine steak.” Subjects for the present work (n = 67) were sampled from several farms, representative of the entire breeding area; all individuals were recorded in the herd book. Charolais and Limousin are beef breeds of French origin, often imported in Italy by fattening farms; taken together, they share an important part of the Italian beef market. Subjects for the present work (n = 69 and 67, respectively) were imported from France by different farms at 8 to 10 mo of age. The Italian Friesian (Frisona) is the main dairy breed reared in Italy, but it also is a relevant source of meat (male calves and endcareer subjects). Frisona subjects (n = 66) were sampled in a single farm, based on its herd book. This sampling scheme assured that the kinship coefficient of all possible pairs of animals in each breed was lower than 12.5%. Blood was collected within farms or at slaughter. Microsatellite Analysis Samples of DNA were extracted following Jeanpierre (1987). Of the 19 STR typed, 7 were included in the panel of 30 markers agreed at the 1996 meeting of the International Society for Animal Genetics in Tours, France (http://www.projects.roslin.ac.uk/cdiv/ markers.html), and the others were selected based on our previous experience (Ciampolini et al., 1995, 2000). Primer sequences were obtained from Vaiman et al. (1994; INRA-loci) and Steffen et al. (1993; ETH-loci). Optimum PCR amplification conditions were determined for each marker separately. (They are available on request.) Amplified fragments were separated by capillary electrophoresis using an ABI PRISM 310 automated sequencer (Applied Biosystems, Foster City, CA) following manufacturer’s protocols. Data were analyzed using GENESCAN 2.0 and GENOTYPER 2.0. Genetic Structure of the Population Sample In preliminary genetic analysis, allele frequencies at the typed loci were estimated by direct count, and observed (H) and expected (E{H}) locus heterozygosities were determined as the proportion of total heterozygous individuals and as 1 − Σpi2, respectively. The degree of genetic variation within and among breeds was measured by Wright’s F-statistics. The homozygote excess in the total sample (FIT) was computed as FIT = 1 − HT/E{HT}, where HT was the heterozygosity of the total sample (HT is equivalent to the weighted mean of the observed heterozygosities computed on individual samples). Global and pairwise FST were calculated by ARLEQUIN (Schneider et al., 2000) as the ratio of the variance of gene frequencies among (or between) subsamples to the total variance, and FIS was calculated as FIS = (FIT − FST)/(1 − FST). We also calculated the excess of homozygotes in each breed [a quantity denoted fITi, the small letter f indicating a lower level in the hierarchical analysis (Presciuttini et al., 1990)], as fITi = 1 − Hi/E{Hi}, where Hi was the heterozygosity of breed i. The mean value of the 4 fITi is theoretically equivalent to FIS. Linkage disequilibrium (or gametic imbalance for unlinked loci) was evaluated between all pairs of loci, separately for the 4 breeds, using ARLEQUIN. Genotype Likelihoods The log-likelihood of a multilocus genotype, conditional on the individual originating from a particular breed, was computed as the sum, over all loci, of the logarithm of probability of each locus genotype in that breed (Paetkau et al., 1995). In this approach, a problem arises when 1 of the 2 alleles, or both, are absent from a putative source sample because the calculation produces a multilocus zero likelihood. Several solutions have been proposed, from adding the test genotype to all samples (Paetkau et al., 1995), to assigning arbitrarily low values to the missing alleles (Paetkau et al., 1998; Davies et al., 1999; Paetkau et al., 2004), to replacing them with the inverse number of gene copies in each sample or a related value (Waser and Strobeck, 1998; Guinand et al., 2002), to calculating genotype probabilities as the sampling of 2 alleles from a compound multinomial-Dirichlet distribution (Rannala and Mountain, 1997). We first calculated the genotype log-likelihoods using ARLEQUIN; however, the method adopted by this program to take into account the missing alleles is not explicit, and we recalculated the log-likelihoods using the following estimates of the allele frequencies: pi = (fi + 1)/(ni + a), where fi is the number of copies of an allele observed in breed i, ni is the number of gene copies for that locus in that breed (equal to twice its sample size), and a is the number of alleles at that locus observed in the total sample of the 4 breeds. This is an application of Laplace’s rule of succession (Edwards, 1992) and corresponds to the posterior estimation of the allele frequencies when a uniform prior is assumed; it produces estimates of allele frequencies similar to the method of Rannala and Mountain (1997), but it is more conservative. Calculations were performed in an Excel spreadsheet (Microsoft, Seattle, WA) after checking that our algorithm produced the same results as ARLEQUIN when no missing alleles were present in any sample. Assignment Tests The log-likelihoods of the observed genotypes (4 values for each individual, 1 computed for its true breed and 3 computed for the other breeds) were used to obtain the distributions of the logarithm of the likelihood ratio [log(LR)] of true vs. false assignments for all possible breed pairs. Means and SD of the observed log(LR) distributions were used to obtain the proportions of false positives (α) and true positives (1 − β) at 13 Assigning individuals to cattle breeds Table 1. Wright’s F-statistics computed for 4 cattle breeds Charolais Locus 4 D1S6 D3S9 D4S11 D4S26 D5S1 D7S6 D9S1 D11S9 D12S4 D15S5 D16S10 D16S114 D17S64 D18S5 D21S4 D21S124 D23S15 D27S20 D27S16 Averages Chianina Frisona Limousin Name Na1 E{H}2 FIT FST FIS fIT Pa3 fIT Pa fIT Pa fIT Pa INRA011 INRA006 INRA072 INRA037 ETH152 INRA053 ETH225 INRA032 INRA005 INRA050 INRA013 INRA035 INRA025 INRA063 ETH131 INRA031 INRA064 INRA016 INRA027 14 8 10 11 9 7 9 10 3 15 11 6 9 6 12 15 6 10 6 9.3 0.615 0.460 0.706 0.704 0.743 0.718 0.807 0.704 0.607 0.791 0.745 0.400 0.786 0.633 0.827 0.780 0.753 0.810 0.764 0.703 0.250 0.112 0.105 0.076 0.135 0.193 0.120 0.034 0.075 0.149 0.071 0.461 0.376 0.008 0.048 0.247 0.220 0.018 0.154 0.150 0.128 0.108 0.152 0.051 0.055 0.170 0.094 0.057 0.032 0.076 0.104 0.033 0.121 0.028 0.068 0.093 0.182 0.065 0.095 0.090 0.140 0.005 −0.054 0.027 0.084 0.027 0.028 −0.024 0.045 0.079 −0.036 0.442 0.289 −0.021 −0.022 0.170 0.046 −0.051 0.065 0.065 0.069 0.052 −0.002 0.143* 0.106 0.097 0.078 0.078 0.103 0.082 0.020 0.457** 0.498*** −0.058 −0.006 0.055 0.095 0.091 0.185* 0.113 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0.3 −0.050 0.006 −0.030 −0.025 0.016 0.217* −0.071 0.013 −0.065 0.134 0.164 0.519*** 0.105 −0.114 0.041 0.291*** 0.099 −0.018 0.038 0.067 3 2 1 0 2 2 2 1 0 1 1 0 0 0 2 3 0 0 0 1.1 0.121 0.070 −0.099 0.057 0.069 −0.061 0.011 −0.131 0.132 0.069 −0.073 0.126 0.402*** 0.128 −0.062 0.176 −0.061 −0.195** 0.051 0.039 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0.2 0.447*** 0.057 0.039 −0.027 0.218* 0.002 0.189* −0.012 0.068 0.110 −0.112 0.625*** 0.285*** −0.001 0.016 0.202** 0.176* −0.017 0.077 0.123 2 1 2 1 1 0 0 2 0 1 1 0 1 0 1 3 0 0 0 0.8 1 Total number of observed alleles. Expected heterozygosity. 3 Number of alleles observed in that breed only. 4 Loci in bold italics have not been used in the assignment tests. *P < 0.05 (homozygosity test); **P < 0.01 (homozygosity test); ***P < 0.001 (homozygosity test). 2 several test values of the log(LR). Normality of log(LR) distributions were assessed by the Kolmogorov-Smirnov test (Sokal and Rohlf, 1995). The posterior probability Pr(breedJ) that an individual originated from breed J, given the alternative hypothesis that it originated from breed K, and given the ratio γ = (1 − β)/α, was calculated as Pr(breedJ) = γ/(γ + 1), which assumes equal priors (Weir, 1996). Simulated Animals Simulations of 10,000 multilocus genotypes for each breed were performed assuming Hardy-Weinberg equilibrium and locus independence, using the estimated allele frequencies. Assignment tests and log(LR) distributions were calculated for the simulated animals in the same way as for the true animals. RESULTS Genetic Variation Among and Within Breeds A total of 269 animals, evenly distributed among the 4 breeds (Charolais = 69, Chianina = 67, Frisona = 66, and Limousin = 67), was typed for 19 microsatellite loci (Table 1). Observed number of alleles per locus in the total sample was between 3 (INRA005) and 15 (INRA031 and INRA050), and the expected heterozygosity (E{H}) was between 0.400 (INRA035) and 0.827 (ETH131). A remarkable proportion of all alleles (44 of 177; 25%) was found in a single breed only; among these, 20 alleles were private to Chianina (2 with fre- quencies >10%), 16 to Limousin, 5 to Charolais, and 3 to Frisona. Analysis using Wright’s F-statistics revealed a high level of homozygote excess in the total sample (FIT = 15%, averaged over loci); this was in large part due to variation of gene frequency among breeds (FST = 9%) and to a lesser extent to homozygote excess within breeds (FIS = 6.5%). Analysis of homozygote excess performed at the level of each breed (this index is called fIT to distinguish it from the analogous parameter computed for the total sample; see Table 1) revealed 2 loci (INRA035 and INRA025) with highly significant homozygote excess in three breeds each. This excess was very high on an absolute scale (e.g., 62.5% in Limousin for INRA035 and 49.8% in Charolais for INRA025), suggesting that it was due to experimental artifacts rather than to real genetic effects. Two other loci (INRA011 and INRA031) also displayed FIS values >10% because of large homozygote excess in some breeds; therefore, these 4 loci were excluded from further analyses. After this modification, the new mean values of Wright’s F-statistics were as follows: FIT = 0.101, FST = 0.089, and FIS = 0.013. As two pairs of loci were located on the same chromosome (INRA037/INRA072 and INRA016/INRA027, respectively), we tested all possible pairs of loci for linkage disequilibrium within each breed. A total of 420 pairwise tests were carried out (105 in each breed), and 57 of them (13.6%) showed P-values < 0.05; this proportion is significantly greater than expected by chance (expected number = 21.0; χ2 = 64.9; 1 df). The 2 syntenic locus pairs contributed only 2 significant 14 Ciampolini et al. Assignment Tests Figure 1. Unrooted dendogram of the 4 breeds based on the FST pairwise distance matrix calculated on 15 highly variable genetic markers (STR) loci. contrasts of 57, thereby revealing the presence of highly significant gametic imbalance in all breeds. Genetic distance between breeds measured by pairwise FST (6 total comparisons) was calculated using the 15 validated loci. All 6 contrasts were significant at P < 0.001. Frisona (the dairy breed) was the most differentiated (FST = 0.134, 0.102, and 0.101 with Chianina, Charolais, and Limousin, respectively), whereas Limousin and Charolais were less differentiated (FST = 0.041). Figure 1 shows the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) tree corresponding to the FST distance matrix. Genotype Likelihoods We first applied the likelihood calculation implemented in the ARLEQUIN package and compared the results with those obtained by our posterior estimation of allele frequencies. As an example, Figure 2 shows the results of Chianina vs. Limousin. Two log(LR) distributions are shown: the assignments of true Chianina animals to their breed (rather than to Limousin) and the false assignments of Limousin animals to Chianina, respectively. The 2 distributions are more separated from each other using ARLEQUIN, showing that our method of correcting for the missing alleles is more conservative. Therefore, we applied it to all subsequent analyses, using the set of 15 loci validated in the previous section. Figure 3 shows the plots of the log-likelihoods of all animals of each breed calculated for their true breed (dots) in comparison with the loglikelihoods that they are assigned to the other 3 breeds. It is evident that Frisona is the most discriminated from all other breeds, whereas Limousin and Charolais are the least discriminated; for example, the likelihood of 2 Limousin animals being Charolais is higher than that of being Limousin, and the likelihoods are very close for 2 other animals. To estimate the statistical power of the assignment tests at greater resolution than was possible given the available sample sizes, we initially opted for a simulation approach (Roques et al., 1999). For each breed, we generated 10,000 multilocus genotypes and calculated the log(LR) of breed assignment; however, the average log(LR) obtained in the simulations was lower than that of the real animals by about 3 log units for all true assignments and by approximately 2 log units for all false assignments (data not shown). We then abandoned this approach and performed the statistical analysis by recourse to the theory of the normal distribution, using the mean and SD of the log(LR) calculated from the samples. Deviations from normality were subjected to a preliminary Kolmogorov-Smirnov test; none of the 12 tests (3 pairwise contrasts for each breed) was significant. Figure 4 shows, as an example, the log(LR) distributions obtained contrasting true Limousin animals against the Charolais falsely attributed to Limousin; corresponding normal distributions are superimposed. Limousin and Charolais have been chosen for display because these are the least discriminated breeds. In fact, the log(LR) of some Limousin animals is <0, meaning that their genotype would be more likely among the Charolais than among the Limousin. In the settings of Figure 4, they are to be considered as false negatives. (They become false positives when the presumed breed is Charolais and the tested breed is Limousin.) Table 2 summarizes the results of the present work. The proportions of false positives (α) and true positives (1 − β) are shown for all possible breed contrasts, including both the situations in which the presumed breed of an animal (e.g., Chianina) is tested against an alternative hypothesis (e.g., Limousin) and the opposite situation (the assumed breed is Limousin and the tested breed is Chianina). The column “Probability” shows the value of the posterior probability that any animal with log(LR) greater than the test value is derived from the presumed breed, assuming equal priors of the 2 breeds. Three test values of the log(LR) are chosen for display (0, 1, and 2), corresponding to increasing significance levels (α) of the assignments. It is evident from these data that discriminating between Limousin and Charolais is the most difficult situation, particularly when the presumed breed is Charolais. In this case, even a log(LR) >2 produces a posterior probability of assignment <98.5%. In all other contrasts, a probability >99.5% is attained at the test value log(LR) >0 (except the Frisona vs. Limousin case). At the test value log(LR) >2, the probability of assignment is >99.9% in 11 of 12 contrasts, and the power of the test (1 − β) is >99% in 9 of 12 contrasts. DISCUSSION In the current study, we determined the potential of assigning individuals among cattle breeds using Assigning individuals to cattle breeds 15 Figure 2. Results of the assignment tests by ARLEQUIN (open circles) and by our calculation (filled circles). The distributions of the log-likelihood ratio [log(LR)] values that Chianina animals are correctly assigned to the Chianina breed (right curves) are compared with the distributions of Limousin animals falsely classified as Chianina (left). Points represent the moving averages of 3 log(LR) intervals, each of 5 log units. STR. We adopted an apparently straightforward approach, consisting of the following steps: 1) use all available genetic information (19 typed loci) as the basis of individual assignment, 2) use a popular software package to calculate the likelihoods of the multilocus genotypes, 3) use computer simulations to Figure 3. Individual log-likelihood of breed assignments. Charts show the 4 likelihood values calculated for each animal of each breed, including the likelihood that they are drawn from their true breed (dotted line, ordered by increasing values) and the likelihood that they are drawn from the other 3 breeds (unbroken and dashed lines). 16 Ciampolini et al. Figure 4. The log-likelihood ratio [log(LR)] distributions of the true Limousin vs. Charolais contrast. Bars = true animals; curves = normal distributions with mean and standard deviations estimated from true animals. Table 2. Power analysis of breed allocation Test value of log(LR)1 True positives (1 − β) False positives (α) Probability2 True positives (1 − β) Chianina vs. Limousin 0 1 2 0.9981 0.9968 0.9946 0 1 2 0.9987 0.9975 0.9956 0 1 2 1.0000 0.9999 0.9998 0 1 2 0.9616 0.9444 0.9217 0 1 2 0.9920 0.9890 0.9849 0 1 2 0.9985 0.9975 0.9960 0.0024 0.0015 0.0008 0.9976 0.9985 0.9991 0.9976 0.9960 0.9936 0.9989 0.9995 0.9997 0.9989 0.9978 0.9959 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9949 0.9980 0.9993 0.9950 0.9882 0.9744 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 1.0000 1.0000 0.9999 0.9998 0.9995 1 LR = likelihood ratio. Computed as (1 − β)/(1 − β + α). 2 0.9987 0.9993 0.9996 0.0000 0.0000 0.0000 1.0000 1.0000 1.0000 0.0384 0.0258 0.0169 0.9629 0.9746 0.9830 Frisona vs. Limousin Charolais vs. Frisona 0.0001 0.0000 0.0000 0.0013 0.0007 0.0004 Charolais vs. Limousin Limousin vs. Frisona 0.0000 0.0000 0.0000 0.9981 0.9989 0.9994 Frisona vs. Chianina Limousin vs. Charolais 0.0050 0.0019 0.0007 0.0019 0.0011 0.0006 Charolais vs. Chianina Chianina vs. Frisona 0.0000 0.0000 0.0000 Probability Limousin vs. Chianina Chianina vs. Charolais 0.0011 0.0005 0.0003 False positives (α) 0.0080 0.0057 0.0040 0.9921 0.9944 0.9960 Frisona vs. Charolais 0.0015 0.0009 0.0005 0.9985 0.9991 0.9995 Assigning individuals to cattle breeds generate large samples of animals to obtain accurate significance levels of the assignment tests. Experience gained during this work showed that none of these steps could be applied without caveats. Analysis of genetic variation among and within breeds highlighted 4 loci with exceedingly high FIS values. Inspection of observed and expected heterozygosity at the individual breed level (fIT) showed that this effect was due to a large excess of homozygous genotypes in all 4 breeds for the loci INRA035 and INRA025. The INRA035 showed a large excess of homozygous individuals in 2 other reports (Jordana et al., 2003; Casellas et al., 2004) with FIS = 0.384 and 0.643, respectively. A likely explanation is that some heterozygous individuals are classified as homozygotes because of failed amplification of one allele (Jordana et al., 2003; SanCristobal et al., 2003). These results highlight the importance of reporting locus-bylocus F-statistics in the genetic analyses of cattle, to identify the loci that may be more likely subjected to experimental artifacts. After removal of the problematic loci, a residual small excess of homozygotes (FIS = 1.3%) remained, but this is expected in samples composed of individuals coming from several different rearing farms and could be due to within-breed Wahlund’s effect (i.e., to population stratification at the breed level). Considering the index of genetic differentiation among breeds measured by FST (0.09), our estimate was similar to that obtained among other studies performed in cattle (Schmid et al., 1999; Maudet et al., 2002). The likelihood calculation performed by a popular program has proven to be another critical issue. As many as 25% of the alleles observed in the total sample were present in one breed only, so that the occurrence of genotypes with zero likelihood would be quite frequent without correction for the missing alleles. This question must be considered carefully in the validation of assignment tests in cattle, as any given choice may have important consequences on the distributions of the log(LR), and practical applications of assignment tests may ultimately imply charges of fraud (Primmer et al., 2000; Manel et al., 2002); a conservative approach is therefore advisable. We adopted the solution of estimating allele frequencies as posterior probabilities, given the actual counts in a sample, and assuming a uniform prior distribution. This method has the advantage of providing a correction that reduces the importance of all rare alleles in the assignment tests, not only of the alleles that are missing in a sample. For example, if an allele counts 1 in a subsample and 2 in another of the same size, the odds of their frequencies are 1:2 using the usual frequency calculation, whereas the odds become 2:3 using this method. The third step of our original design was to obtain accurate estimates of false-positive and true-positive ratios using large simulated samples of the 4 breeds. However, the log(LR) distributions obtained for the simulated animals were significantly different from 17 those calculated for the real animals, thus questioning the validity of this approach. A possible explanation of this discrepancy is the large level of gametic imbalance detected within all 4 breeds, in accordance with a genome-wide study in cattle (Farnir et al., 2000). This hypothesis could be verified only by taking into account the allelic association in generating the simulated animals; we will consider this aspect in future extensions of the present work. In any case, our results suggest that computer simulations of multilocus genotypes that assume Hardy-Weinberg and linkage equilibrium should be used with caution. The final estimated capability of the 15 validated STR to allocate individuals from the 4 chosen breeds to their correct source is remarkable. A recent work on 3 Italian cattle breeds different from those investigated here (Moioli et al., 2004) reported a much lower power of individual assignments. The discrepancy may in part be explained by the lower genetic differentiation measured between these breeds (FST = 0.06); however, we suggest that the main reason is that those researchers have explored the potential of assigning individuals to a breed using a program that infers a population structure of a sample based on the pooled genotypic data (and then calculates the probability that each animal belongs to each inferred group). Alternatively, we have used the known information on the breed origin of each individual. The first approach has been developed to deal with wild species, in which the population origin of any given animal is generally not apparent; in contrast, the phenotype differences among domesticated animals generally allow for unequivocal breed identification, meaning that the statistical properties of any assignment test can effectively be assessed by determining sensitivity and specificity in samples of known source. How can an assignment test be implemented in practice? Let us suppose that we want to verify that a meat-cut comes from a Chianina animal, as it might be stated in the label. This is far from an implausible scenario, as the market price of Chianina is almost twice that of other breeds present on the Italian beef market. After the just-mentioned 15 loci are typed, the log(LR) that this sample is from a Chianina rather than from any of the other 3 breeds is calculated using the allele frequencies estimated in the present work. Then, we see from Table 2 that a value of log(LR) >0 has probability >99.8% if the cut is truly from Chianina rather than, for example, from Limousin, whereas it has probability <0.2% if it is from Limousin. Assuming equal priors for those 2 breeds, we see that the posterior probability that a sample with log(LR) >0 is truly from Chianina is >99.76%. The corresponding probabilities of assignments to Chianina when the contrasting hypotheses are Charolais and Frisona are 99.9 and 99.999%, respectively. If we assume more stringent assignment criteria (e.g., log(LR) >1 or >2), the posterior probabilities that the meat-cut is from Chianina become overwhelming. Suppose, conversely, 18 Ciampolini et al. that a meat-cut labeled as Chianina comes from a breed other than Chianina. Then, the probabilities of obtaining a log(LR) >0 for the Chianina are 0.24, 0.11, and <0.01% for samples from Limousin, Charolais, and Frisona, respectively; therefore, a log(LR) <0 has a very high probability of being obtained, and the posterior probabilities that the meat-cut is from one of the other breeds are 99.81, 99.87, and >99.99%, respectively. This evidence is certainly compelling. The present analysis strictly applies to the 4 considered breeds; however, the power of the assignment tests depends closely on the level of genetic heterogeneity existing between breeds, which is conveniently measured by FST. Thus, this index can be used to anticipate the potential of allocating individuals among cattle breeds (Bjornstad and Roed, 2002). An important area of future research will be the optimization of the cost:benefit of the assignment test. For example, our work shows that a given predetermined level of statistical significance (e.g., α < 1%) can be obtained with a number of STR <15 for a substantial fraction of the animals. This means that a sequential test based on the consecutive typing of several sets of markers (say, 5 markers per set) could decrease the number of total typed genotypes substantially, while keeping constant the value of α. This procedure could eventually produce a cost-effective, routinely implemented test. Another area of future development will be to implement a test that can answer the question “what is the probability that this animal is actually from this breed?” in a single step, thereby avoiding the need of performing multiple pairwise tests. This problem could be addressed by collecting the allele frequencies of all possible alternative breeds and by forming a pooled database of the mean allele frequencies, weighted by the population size of each contributing breed. LITERATURE CITED Bjornstad, G., and K. H. Roed. 2001. Breed demarcation and potential for breed allocation of horses assessed by microsatellite markers. Anim. Genet. 32:59–65. Bjornstad, G., and K. H. Roed. 2002. Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses. Anim. Genet. 33:264–270. Blott, S. C., J. L. Williams, and C. S. Haley. 1999. Discriminating among cattle breeds using genetic markers. Heredity 82:613–619. Buchanan, F. C., L. J. Adams, R. P. Littlejohn, J. F. Maddox, and A. M. Crawford. 1994. Determination of evolutionary relationships among sheep breeds using microsatellites. Genomics 22:397–403. Casellas, J., N. Jimenez, M. Fina, J. Tarres, A. Sanchez, and J. Piedrafita. 2004. Genetic diversity measures of the bovine Alberes breed using microsatellites: Variability among herds and types of coat colour. J. Anim. Breed. Genet. 121:101–110. Ciampolini, R., H. Leveziel, E. Mazzanti, C. Grohs, and D. Cianci. 2000. Genomic identification of the breed of an individual or its tissue. Meat Sci. 54:35–40. Ciampolini, R., K. Moazami-Goudarzi, D. Vaiman, C. Dillmann, E. Mazzanti, J. L. Foulley, H. Leveziel, and D. Cianci. 1995. Individual multilocus genotypes using microsatellite polymor- phisms to permit the analysis of the genetic variability within and between Italian beef cattle breeds. J. Anim. Sci. 73:3259–3268. Davies, N., F. X. Villablanca, and G. K. Roderick. 1999. Determining the source of individuals: Multilocus genotyping in nonequilibrium population genetics. Trends Ecol. Evol. 14:17–21. Edwards, A. W. F. 1992. Likelihood—Expanded Edition. The Johns Hopkins Univ. Press, Baltimore, MD. Farnir, F., W. Coppieters, J. J. Arranz, P. Berzi, N. Cambisano, B. Grisart, L. Karim, F. Marcq, L. Moreau, M. Mni, C. Nezer, P. Simon, P. Vanmanshoven, D. Wagenaar, and M. Georges. 2000. Extensive genome-wide linkage disequilibrium in cattle. Genome Res. 10:220–227. Guinand, B., A. Topchy, K. S. Page, M. K. Burnham-Curtis, W. F. Punch, and K. T. Scribner. 2002. Comparisons of likelihood and machine learning methods of individual classification. J. Hered. 93:260–269. Jeanpierre, M. 1987. A rapid method for the purification of DNA from blood. Nucleic Acids Res. 15:9611. Jordana, J., P. Alexandrino, A. Beja-Pereira, I. Bessa, J. Canon, Y. Carretero, S. Dunner, D. Laloe, K. Moazami-Goudarzi, A. Sanchez, and N. Ferrand. 2003. Genetic structure of eighteen local south european beef cattle breeds by comparative F-statistics analysis. J. Anim. Breed. Genet. 120:73–87. Koskinen, M. T. 2003. Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog. Anim. Genet. 34:297–301. Manel, S., P. Berthier, and G. Lukart. 2002. Detecting wildlife poaching identifying the origin of individuals with bayesian assignment test and multilocus genotypes. Conserv. Biol. 16:650–659. Maudet, C., G. Luikart, and P. Taberlet. 2002. Genetic diversity and assignment tests among seven French cattle breeds based on microsatellite DNA analysis. J. Anim. Sci. 80:942–950. McKean, J. D. 2001. The importance of traceability for public health and consumer protection. Rev. Sci. Tech. 20:363–371. Moioli, B., F. Napolitano, and G. Catillo. 2004. Genetic diversity between Piedmontese, Maremmana, and Podolica cattle breeds. J. Hered. 95:250–256. Narrod, C. A., and K. O. Fuglie. 2000. Private-sector investment in livestock breeding. Agribusiness 4:457–470. Paetkau, D., W. Calvert, I. Stirling, and C. Strobeck. 1995. Microsatellite analysis of population structure in Canadian polar bears. Mol. Ecol. 4:347–354. Paetkau, D., G. F. Shields, and C. Strobeck. 1998. Gene flow between insular, coastal and interior populations of brown bears in Alaska. Mol. Ecol. 7:1283–1292. Paetkau, D., R. Slade, M. Burden, and A. Estoup. 2004. Genetic assignment methods for the direct, real-time estimation of migration rate: A simulation-based exploration of accuracy and power. Mol. Ecol. 13:55–65. Presciuttini, S., G. Paoli, and M. Franceschi. 1990. Surname versus gene structure of a small isolate. Ann. Hum. Genet. 54:79–90. Primmer, C. R., M. T. Koskinen, and J. Piironen. 2000. The one that did not get away: Individual assignment using microsatellite data detects a case of fishing competition fraud. Proc. R. Soc. Lond. B Biol. Sci. 267:1699–1704. Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–959. Rannala, B., and J. L. Mountain. 1997. Detecting immigration by using multilocus genotypes. Proc. Natl. Acad. Sci. USA 94:9197–9201. Roques, S., P. Duchesne, and L. Bernatchez. 1999. Potential of microsatellites for individual assignment: The North Atlantic redfish (genus Sebastes) species complex as a case study. Mol. Ecol. 8:1703–1717. SanCristobal, M., C. Chevalet, J. L. Foulley, and L. Ollivier. 2003. Some methods for analysing genetic marker data in biodiversity setting—example of the pigbiodiv data. Archivos de Zootecnia 52:173–183. Assigning individuals to cattle breeds Schmid, M., N. Saitbekova, C. Gaillard, and G. Dolf. 1999. Genetic diversity in Swiss cattle breeds. J. Anim. Breed. Genet. 116:1–8. Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin ver 2.000: A software for population genetics data analysis. Genetics and Biometry Laboratory, University of Geneva, Switzerland. Sokal, R. R., and F. J. Rohlf. 1995. Biometry. 3rd ed. W. H. Freeman and Co., New York, NY. Steffer, P., A. Eggen, A. B. Dietz, J. E. Womack, G. Stranzinger, and R. Fries. 1993. Isolation and mapping of polymorphic microsatellites in cattle. Anim. Genet. 24:121–124. 19 Vaiman, D., D. Mercier, K. Goudarzi, A. Eggen, R. Ciampolini, A. Lepingle, R. Velmala, J. Kaukinen, S. L. Varvio, P. Martin, H. Leveziel, and G. Guerin. 1994. A set of 99 cattle microsatellites: Characterization, synteny mapping, and polymorphism. Mammal. Genome 5:288–297. Waser, P. M., and C. Strobeck. 1998. Genetic signature of interpopulation dispersal. Trends Ecol. Evol. 13:43–44. Weir, B. S. 1996. Genetic Data Analysis II. Sinauer, Sunderland, MA.
© Copyright 2026 Paperzz