Evaluation of three strategies using DNA markers for

Aquaculture 250 (2005) 70 – 81
www.elsevier.com/locate/aqua-online
Evaluation of three strategies using DNA markers for
traceability in aquaculture species
Ben HayesT, Anna K. Sonesson, Bjarne Gjerde
AKVAFORSK, Institute for Aquaculture Research, P.O. 5010, 1432 Ås, Norway
Received 14 May 2004; received in revised form 27 January 2005; accepted 2 March 2005
Abstract
Traceability schemes for aquaculture species are essential for tracing market product to farm of origin in the event of
detection of disease or toxins in the market fish. DNA markers have been proposed as a tool for traceability. These markers
can be used to genotype fish by taking a sample from live fish or fish product at any stage along the production chain. In
this paper, we consider three alternate traceability schemes using DNA markers. The example of the Norwegian farmed
Atlantic salmon industry was used. This industry, like many aquaculture industries, has three tiers, the nucleus, multiplier and
commercial tiers. The nucleus individuals are grandparents of the commercial fish, and the multiplier individuals are the
parents of the commercial fish. The traceability strategies we considered were: (1) FS, assignment of market place fish to full
sib families based on the marker information (this strategy assumes all individuals from a full sib family are allocated to a
single farm and a limited number of fish, representing all full sib families on that farm, are genotyped); (2) PAR, assignment
of market place fish to parents (multiplier individuals) and (3) GRAND, assignment of market place fish to grandparents
(nucleus individuals). Using simulation, we determined the number of DNA markers required to achieve 95% of correct
assignment decisions for each strategy. The simulation included a wild population which contributed to market place fish.
The wild fish were correctly assigned if they were excluded from belonging to the farmed population in each strategy,
otherwise they were incorrectly assigned. Both microsatellite markers or single nucleotide polymorphism markers were
considered. Seventy five, 15, and 50 microsatellites were required to achieve 95% correct assignment decisions for FS, PAR
and GRAND, respectively. Four hundred, 75 and 200 SNPs were required to achieve 95% correct assignment decisions for
FS, PAR and GRAND, respectively. If the cost of genotyping microsatellites is assumed to be five times as high as
genotyping a SNP, GRAND using SNP markers is the cheapest strategy. The logistics of implementing each strategy are
discussed. GRAND in particular and PAR in some industries requires complicated logistics. The most suitable and cost
effective traceability strategy for a particular industry will depend heavily on the organisation of that industry, for example
T Corresponding author. Tel.: +47 6494 9542; fax: +47 6494 9502.
E-mail address: [email protected] (B. Hayes).
0044-8486/$ - see front matter D 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.aquaculture.2005.03.008
B. Hayes et al. / Aquaculture 250 (2005) 70–81
71
the degree of recording transfer of fish, eggs and larvae between tiers. Even if complicated logistics prevent the adoption of
marker based schemes by some industries, traceability with DNA markers may still be important for verification of labellingbased schemes.
D 2005 Elsevier B.V. All rights reserved.
Keywords: Traceability; SNP markers; Microsatellites; Accuracy; Aquaculture species
1. Introduction
Traceability schemes allow consumers to obtain
information on the origin and the production chain of
food products. Such schemes are essential for tracing
of fish back to farm of origin, for example in the event
of detection of disease or toxins in market fish. Such
schemes could also be instrumental in monitoring
producers to minimize the number of escapees from
the farms and thus reducing the environmental load of
farming (for example parasite problems and possible
interbreeding with wild fish).
Håstein et al. (2001) reviewed methods of traceability available for aquaculture species. The methods reviewed included external tags, chemical
marking using inorganic substances, physical marking (e.g. fin clipping), labelling of product (e.g.
documents relating to movement, invoices, etc) and
DNA markers. Using DNA markers in traceability
schemes is attractive for a number of reasons. The
DNA markers can be genotyped by taking a sample
from the fish or fish product at any stage along the
production chain, and analysing this sample in the
laboratory. A very small sample of tissue is required
for DNA analysis, so the method can also be used on
live fish. Two types of DNA markers have been
suggested for use in traceability schemes, microsatellites and single nucleotide polymorphisms
(SNPs). Considerable numbers of microsatellite loci
are now available for tilapias, rainbow trout, Atlantic
salmon (e.g. Kocher et al., 1998; Sakamoto et al.,
2000; Gilbey et al., 2004) and have been or are
being developed for many other species, while large
numbers (N1000) of SNPs have been described for
both salmonoids (Hayes et al., 2004; Smith et al.,
2003) and catfish (He et al., 2003). Microsatellite
markers are highly informative, with many alleles at
each locus, while SNPs typically have only two
alleles per locus. However, microsatellites are more
expensive than SNPs to genotype (Glaubitz et al.,
2003). DNA markers can also be used to verify that
traceability schemes using other methods, such as
labelling, are accurate.
One option to implement a traceability scheme
with DNA markers would be to genotype all farmed
fish and store their genotypes in a data base. When
fish are sampled from the market place, their
genotypes would be compared to the genotypes in
the data base, in order to determine farm of origin.
Unfortunately for many aquaculture industries, the
large number of farmed animals and thus the
enormous cost of genotyping required mean that this
option is not feasible. So schemes which reduce the
amount of genotyping are required.
One alternative scheme would be possible if entire
full sib families are always allocated to a single farm.
In this case, if enough individuals are genotyped from
a single farm such that all full sib families are
represented in the sample, then a fish from the market
place can be assigned to farm of origin by reconstructing the full sib relationships from the marker
genotypes.
Other alternative traceability schemes with DNA
markers could take advantage of the multi-tier
structure which exists in most aquaculture industries.
Generally a relatively small nucleus population,
where selective breeding takes place, supplies eggs
or larvae to much larger grow-out or commercial
operations. So a feasible traceability scheme may be
to genotype only the parent individuals in the
nucleus and store their genotypes. Fish sampled
from the market place (offspring of nucleus individuals) could then be assigned to their parents using
marker information. Assignment of fish to parents or
populations with a high degree of accuracy with
DNA markers has been demonstrated using simulated data (e.g. SanCristobal and Chevalet, 1997) and
in living populations of turbot, rainbow trout and
Atlantic salmon (Estoup et al., 1998; Letcher and
King, 2001; Villanueva et al., 2002). It is important
72
B. Hayes et al. / Aquaculture 250 (2005) 70–81
to note that this strategy does not achieve full
traceability directly, as market fish are traced back to
nucleus parents. To further trace fish to farm of
origin, the allocation of eggs resulting from a mating
in the nucleus to commercial operations would have
to be recorded. In some industries this may, due to
complicated logistics, become a difficult task.
Some aquaculture industries, such as the Norwegian salmon industry, have an additional tier, the
multiplier. This tier is required when the nucleus is
unable to supply all commercial farms with eggs or
larvae. The multiplier tier takes eggs or larvae from
the nucleus, grows these to broodfish, mates these
broodfish and then supplies the commercial tier with
the resulting eggs or larvae. In industries which have
this three tier structure, another traceability strategy
is possible. All the parent fish in the multipliers are
genotyped, and this information is stored in a data
base. Fish sampled from the market place can then
be assigned back to the nucleus parents (i.e., their
grandparents) or to the multiplier parent (i.e., their
parents). Letcher and King (2001) showed that fish
could be accurately assigned to their grandparents
provided that a sufficient number of markers are
used. This scheme has quite complicated logistic
requirements, as the destination of both the eggs
from the nucleus matings and eggs from the multiplier matings must be recorded.
The aim of this paper was to assess the feasibility,
in terms of the number of markers and number of fish
required to be genotyped, of alternate traceability
schemes using DNA markers in aquaculture species.
We have used computer simulation to evaluate
alternate schemes. The ultimate goal of traceability
systems is to trace a fish from any point in the
production chain to any other point in the production
chain. None of the strategies above achieve this
directly, and the logistical considerations required by
each strategy are discussed.
2. Methods
2.1. Structure of the simulation
We have loosely based our simulation on the
example of the Norwegian salmon industry. Two
closed nucleus populations, representing two breeding
companies or two different sub-populations of one
breeding company, were simulated. The founders of
the nucleus were sampled from a large simulated wild
population (of 1000 individuals). For ten generations,
within each nucleus, 30 males and 60 females were
selected at random and mated, in the mating ratio of
one male to two females, and with 10 offsprings per
mating. After the ten generations of breeding (random
mating and no selection) in the nuclei, 30 males and
60 females were randomly selected from each of the
nuclei populations and mated to produce two multiplier populations.
From these two multiplier populations, 300 males
and 600 females were selected at random from each
population and mated to produce a total commercial
population of 12,000 fish. These fish belonged to
fifty commercial operations, with the offspring from
the first 24 of the matings among the parents
belonging to the first commercial operation, the
offspring of the next 24 matings belonging to the
next commercial operation, and so on. As there were
10 offspring per family, there were 240 individuals
per commercial operations. Simultaneously, the wild
population continued to breed, maintaining a constant size of 1000. Note that the wild population
here could also included farmed fish originating
from breeding operations in different countries (as
long as parents or grandparents did not belong to the
nucleus or multiplier operations described above).
Additionally, in practise the wild fish may in fact be
escapees from fish farms. However in this case they
should be identified by the traceability schemes (see
methods below and Discussion) as being of farmed
origin. The commercial and wild population together
constituted the fish in the market place. A single fish
was sampled from the market place for purposes of
assignment. There were at least 200 replicates for each
scheme.
In our simulation, a fish had two marker alleles at
each of a number of independently segregating loci.
Markers were either microsatellites (10 alleles per
loci, average heterozygosity 0.8), or single nucleotide
polymorphisms (SNPs) (two alleles per loci). In the
base population, the frequency of the 10 microsatellite
alleles was sampled from a Poisson distribution (Fig.
1), for each locus. This distribution was chosen based
on the results of Skaala et al. (2004), who reported an
average of 9.9 alleles segregating within different
B. Hayes et al. / Aquaculture 250 (2005) 70–81
73
Frequency in base population
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
Microsatellite allele
Fig. 1. Frequency of alleles at each microsatellite locus in the simulated base population.
Atlantic salmon populations for 12 microsatellites,
with an average heterozygosity up to 0.76 in some
wild populations, and to reflect to some degree the
observation that for many loci in fish populations
there are a small number of alleles at moderate
frequencies, and a large number of alleles at low
frequencies (e.g. Letcher and King, 2001). SNPs had
two alleles with frequencies of 0.8 and 0.2 (at least
1200 human SNPs reviewed by Marth et al. (2001)
had minor allele frequencies of 0.2 or greater) giving
an average heterozygosity of 0.32. A progeny from
the mating of two parents received an allele from each
parent, with equal probability of an allele being
transmitted from either of the parent’s alleles. The
number of microsatellite markers simulated was
between 5 and 200, the number of SNPs between
5 and 500.
For computational reasons, we were unable to
simulate the full size of the Norwegian salmon
industry. Additionally, the smolt rearing and growout operations which are a part of this industry were
considered as a single entity (a commercial operation),
as both these sets of operations use fish from the
same generation. The population simulated is shown
in Fig. 2.
2.2. Traceability strategies
Three different traceability strategies were then
compared for the number of markers (either micro-
satellites or SNPs) required to achieve a 95% level of
accuracy in assigning fish back to commercial
operation, multiplier or nucleus of origin.
2.2.1. Strategy FS
In strategy FS (for full sib), either 25, 50, 100 or
150 fish were sampled from each commercial
operation at random and genotyped. A fish was then
sampled from the market place. The fish from the
commercial operations and the fish sampled from the
market place were genotyped for 25, 50, 100, 150 or
200 markers (microsatellites or SNPs). Using this
marker information, the relationship between the
market fish and each of the fish sampled from the
commercial operations was calculated as in Eding and
Meuwissen (2001). For a given single locus, a
similarity index S xy between two individuals x and y
is calculated, where S xy = 1 when genotype x = ii (i.e.
both alleles at loci l are identical) and genotype y = ii,
or when x = ij and y = ij. S xy = 0.5 when x = ii and y = ij,
or vice versa, S xy = 0.25 when x = ij and y = ik, and
S xy = 0 when the two individuals have no alleles in
common at the locus. The similarity as a result of
chance alone was
s¼
a
X
p2i
i¼1
p i is the frequency of allele i in the (random mating)
population, and a is the number of alleles at the locus.
74
B. Hayes et al. / Aquaculture 250 (2005) 70–81
Wild population
(effective population
size =1000)
500
x 500
Sampling of wild population for nuclei foundation
Nucleus 1
Grandparents
Nucleus 2
60
x 120
Multipliers
Parents
Multipliers
600
x 1200
1000 Offspring
Offspring
Commercial
operation
Commercial Commercial
operation
operation
Commercial
operation
Commercial
operation
12000 Offspring, split into 50
commercial operations of 240 fish
each
Breeding in
the wild
Market
Fig. 2. Population structure for simulation. There were two nuclei populations, two multipliers, and fifty commercial operations. Thirty males
and 60 females were selected from each nucleus to breed each multiplier tier, and 300 males and 600 females were selected from each multiplier
to breed 12,000 commercial offspring, split across 50 commercial operations of 240 fish each.
Then the relationship between individuals x and y at
locus l is calculated as
rl ¼ Sxy s =ð1 sÞ
Overall r which utilized information from all loci
was computed as an average value from across all loci.
This index is appropriate in our case, as it accounts for
inbreeding that is expected to occur in a relationship of
finite size (Eding and Meuwissen, 2001).
The threshold value of r for two individuals to be
considered as full sibs was set at 0.375 for microsatellites and 0.5 for SNPs, as in 100 replicate
simulations with large numbers of full sibs and
markers these were the minumum r values for two
full sibs. If the value of r did not exceed the
threshold for any of the comparisons, the fish
sampled from the market place was assumed to be
of wild origin. Two hundred replicate samples were
performed, and the proportion of correct assignment
decisions (fish sampled from the market place
correctly allocated to commercial operation of origin,
or to the wild) was calculated.
2.2.2. Strategy PAR
For strategy PAR (for parents), and for a fish
sampled from the market place, we calculated the
probabilities that the fish came from any of the
possible pairs of parents, following Letcher and King
(2001). For each marker, the probability that an
offspring with the genotype A i A j is derived from
parents with genotype A a A b and A c A d , is:
Pr Ai Aj jðAa Ab Þ; ðAc Ad Þ ¼ T ðijabÞT ð jjcd Þ
þ T ð jjabÞT ðijcd Þ
where T(i|ab) = Pr([A i ]|(A a A b ),(A c A d )) =½(a = i) +½
(b = i), and (a = i) and is (b = i) are Boolean operator
that give the value of one if the allele value of a equals
the allele value of i, or zero otherwise. If the offspring
is a homozygote Pr([A i A j ]|(A a A b ),(A c A d )) is divided
by two. The global likelihood for the offspring
conditional on the parental pair is the product of all
single locus likelihoods. The fish sampled from the
market place is considered to be the offspring of the
parental pair with the highest global likelihood. If all
B. Hayes et al. / Aquaculture 250 (2005) 70–81
the global likelihoods are zero, the fish is considered
to be of wild origin. A sampled fish was correctly
assigned if it was from multiplier parents, and was
assigned to the correct parents, or if the sampled fish
was of wild origin, it was correctly assigned if it was
excluded as the offspring of multiplier parents.
2.2.3. Strategy GRAND
In strategy GRAND (for grandparents), the exclusion probabilities were extended from those derived for
parentage assignment (as above) to assignment of
grandprogeny to grandparents, as described by Letcher
and King (2001). A sampled fish was correctly
assigned if it was from nuclei grandparents, and was
assigned to the correct grandparents, or if the sampled
fish was of wild origin, it was correctly assigned if it
was excluded as the offspring of multiplier parents.
3. Results
3.1. Proportion of correct assignment decisions from
FS
Our results suggest dependencies between the
number of fish genotyped from each commercial
75
operation and the number of markers that need to be
genotyped in the FS strategy, in order to exceed
0.95 of correct assignment decisions (Fig. 3, microsatellites; Fig. 4, SNPs). If fewer fish are sampled
from each commercial operation, a larger number of
markers must be genotyped in order to achieve 0.95
correct assignment decisions. With microsatellites
(Fig. 3), the lowest number of total genotypings
required to achieve 0.95 correct assignment decisions (e.g., number of markers per fish times
number of fish sampled per commercial operation
times number of commercial operation) was
achieved when 100 fish were sampled per commercial operation, and these fish were genotyped for
about 75 markers. As there were 24 full sib families
per commercial operation, sampling 100 fish per
farm should give on average 4 fish sampled per full
sib family. There was no advantage (in terms of
correct proportion of assignment decisions) in
sampling more than 100 fish per commercial
operation, indicating that in a sample of this size
all full sib families in the commercial operation are
sufficiently represented.
With SNPs, the 0.95 threshold was reached with
about 175 SNPs (150 fish sampled), 225 SNPs (100
fish sampled) or 400 SNPs (50 fish sampled) (Fig. 4).
Proportion of correct assignment decisions
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.95 Threshold
150 fish genotyped per farm
100 fish genotyped per farm
50 fish genotyped per farm
25 fish genotyped per farm
0.2
0.1
0
0
50
100
150
200
Number of microsatellite markers
Fig. 3. Proportion of correct assignment decisions from strategy FS with increasing numbers of fish sampled from fifty commercial operations
and genotyped for an increasing number of microsatellite markers.
B. Hayes et al. / Aquaculture 250 (2005) 70–81
Proportion of correct assignment decisions
76
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.95 Threshold
150 fish genotyped per farm
100 fish genotyped per farm
50 fish genotyped per farm
25 fish genotyped per farm
0.2
0.1
0
0
50
100
150
200
250
300
350
400
450
500
Number of SNP markers
Fig. 4. Proportion of correct assignment decisions from strategy FS with increasing numbers of fish sampled from each of fifty commercial
operations and genotyped for an increasing number of SNP markers.
3.2. Proportion of correct assignment decisions from
PAR and GRAND
The proportion of correct assignment decisions in
PAR increased rapidly as more markers were used
(Fig. 5). Ninety five percent of correct assignment
decisions were achieved with about 15 microsatellites,
while about 75 SNP markers were required to achieve
the same level of accuracy.
A substantially greater number of markers, either
microsatellites or SNPs, were required using GRAND
to achieve 0.95 of correct assignment decisions
compared with PAR. The number of microsatellites
required was increased approximately 3 fold from 15
to 50. The number of SNPs required was also
increased approximately 3 fold (75–200). While the
proportion of correct assignment decisions increased
rapidly as more markers were used in PAR (both
1
Proportion of correct assignment
decisions
0.9
0.8
0.7
0.6
0.5
0.4
0.3
PAR Microsatellites
PAR SNPs
GRAND Microsatellites
GRAND SNPs
0.95 Threshold
0.2
0.1
0
0
20
40
60
80
100
120
140
160
180
200
Number of Markers
Fig. 5. Proportion of correct assignment decisions from strategies PAR and GRAND with increasing number of microsatellite and SNP markers.
B. Hayes et al. / Aquaculture 250 (2005) 70–81
SNPs and microsatellites) and GRAND with microsatellites, the proportion of correct assignment decisions increased only gradually as more markers were
added for GRAND with SNPs. This was probably
because of the low information content of these
markers, together with the large number of potential
grandparental combinations.
4. Discussion
Our results suggest that, for the industry simulated,
accurate assignment of a fish sampled from the market
place to either the wild population or to the farmed
population can be achieved using either microsatellite
or SNP markers. For fish assigned to the farmed
population, assignment to parents, grandparents or full
sib groups is possible. Strategy PAR required the
fewest number of markers for 95% correct assignment
decisions (15 microsatellites or 75 SNPs), followed by
GRAND (50 microsatellites or 200 SNPs) and FS, (75
microsatellites or 400 SNPs). We can compare the
relative genotyping costs of each strategy, by evaluating at the number of markers required to achieve a
0.95 accuracy of assigning the market place fish to
parents, grandparents or full sib family, or the wild
population, and assuming a ratio of cost of genotyping
a microsatellite to the cost of genotyping a SNP was
5 : 1, Table 1. This relative cost of genotyping is based
Table 1
Relative genotyping costs required to achieve N0.95 correct
assignment decisions for FS, PAR and GRAND
Microsatellites
(100 fish sampled per
commercial operation)
FS
Number of
markers
Cost
PAR
Number of
markers
Cost
GRAND
Number of
markers
Cost
SNPs (100 fish
sampled per
commercial operation)
75
400
9,000,000
4,800,000
15
75
135,000
135,000
50
200
45,000
36,000
Cost of genotyping SNP= 1 unit, cost of genotyping microsatellites = 5 units.
77
on a comparison of capillary electrophoresis microsatellite genotyping and SNP genotyping with the
Sequenom (TM) MASS-Array system, and assuming
large number of each marker are to be genotyped (in
true currency the costs of genotyping are currently 10
NOK and 2 NOK approximately, for microsatellites
and SNPs respectively).
The costs are incurred each time a new set of
commercial operation (FS), parents (PAR), or grandparents (GRAND) are used. As there are fewer
grandparents than parents, the cheapest strategy,
considering relative genotyping costs only, was
GRAND with SNPs. However these costs ignore the
additional cost of logistic considerations required for
traceability in each strategy (discussed below).
4.1. Comparison of results to those from other studies
In PAR, the proportion of correct assignment
decisions depends on the number of loci, the allelic
diversity at these loci (number of alleles and the
distribution of their frequencies), the number of
offspring and the number of parents and possible
mating combinations. The conclusion from a number
of studies, with a wide range of numbers of possible
parents and offspring, indicate that between 6 and 10
microsatellite markers, with 6–10 alleles per locus, are
sufficient to accurately assign offspring to the correct
parental pair (e.g., Bernatchez and Duchesne, 2000;
Estoup et al., 1998; SanCristobal and Chevalet, 1997;
Letcher and King, 2001; Villanueva et al., 2002). Our
results roughly concur with these studies: 15 microsatellites were required to assign progeny sampled
from the market place to the correct parental pair (1800
possible parents), or exclude the possibility that the
sampled fish was the offspring of any of the 1800
parents (e.g. a fish originating from the wild), in 95% of
replicates. When SNP markers were used, the number
of markers required to achieve the same accuracy was 5
fold greater, reflecting the lower allelic diversity of
these markers. Of course there is scope here to make
some pre-selection of the SNPs used to increase their
informativeness (select those with the highest frequency of the rare allele). However the maximum
heterozygosity of SNPs is still only 0.5, much lower
than the heterozygosity of a typical microsatellite.
In GRAND, the proportion of correct assignment
decisions is dependent on similar parameters as in
78
B. Hayes et al. / Aquaculture 250 (2005) 70–81
Proportion of correct assignment decisions
PAR; the number of loci, allelic diversity at these loci,
the number of offspring and the number of grandparents and possible mating combinations among
these grandparents. In our simulation scheme, the
number of possible grandparents (180) was ten times
less than the number of parents (1800). However,
the number of possible matings among the grandparents to produce parents and then progeny is much
higher than among the parents to produce progeny
(1804 = 1.05 109 compared with 18002 = 3.24 106,
respectively). To achieve the same level of correct
assignment decisions (0.95), approximately 3 times as
many microsatellites were required for assignment of
offspring to grandparents compared with assignment
of offspring to parents, or to the wild population.
Letcher and King (2001) used simulation to assess the
number of loci and number of alleles at these loci
required for accurate assignment of fish to parents or
grandparents in an Atlantic salmon population (though
in a much smaller population than considered here). In
Table 1 of their manuscript, 4 loci with 15 alleles were
required for 95% accuracy of assigning fish to parents,
while 16 loci with 18 alleles were required to achieve
the same accuracy when fish were assigned to grandparents. These results (4 fold increase in the number of
markers required from PAR to GRAND at the same
proportion of correct assignment decisions) are in
rough agreement with ours.
The proportion of correct assignment decisions
from FS is primarily determined by two parameters.
One is the accuracy of estimating the relationship
between a fish sampled from the market place and a
fish genotyped from the commercial operation (Fig.
6). This depends on the number of markers used to
estimate the relationship. The results of Glaubitz et
al. (2003) suggest at least 16–20 microsatellites or
100 SNP markers (frequency of rare allele of 0.2)
are required to accurately (proportion of correct
assignment decisions of 0.95) determine whether
two individuals are full sibs or unrelated. Our
results indicate larger numbers of markers may be
required for the situation we evaluated. Approximately 75 microsatellites were required before the
proportion of correct decisions reached 0.95. The
large discrepancy may be a result of our sampling
scheme, where there is a chance that not all the full
sib families are represented in the sample, the
second parameter of importance. If the number of
markers is low, more individuals from each full sib
family must be included in the sample, as each
relationship between the fish sampled from the
market place to a fish sampled from the commercial
operation will not be estimated very accurately. As
the number of full sib families per commercial
operation is decreased, the sample size taken per
commercial operation can also be decreased, as the
1
0.9
Wild
0.8
Farmed
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
80
100
120
140
160
180
200
Number of Markers
Fig. 6. Proportion of correct assignment decisions from strategy FS, when the fish sampled in the market place was either from a commercial
operation or from the wild population.
B. Hayes et al. / Aquaculture 250 (2005) 70–81
probability of including representatives of all full
sib families increases.
4.2. Limitations of our study
A 95% level of accuracy was used in this study as
the threshold for daccurate assignmentT. Whether this
level is sufficient will depend on the specific goals of
the traceability scheme. While a 95% level may be
sufficient for example labelling fish products with
farm of origin to inform consumer choice, a 100%
level would be required in the situation where a
farmer has exposed a group of fish to a toxin, and
100% of exposed fish need to be identified. The
number of markers required for 100% accuracy of
assignment with the different strategies (if it is
attained) can be found in Figs. 3–5.
We have assumed in our simulations that all of the
DNA markers were unlinked. This will not be the case
especially with the large numbers of SNPs required in
the FS and GRAND strategies. When there is nonindependence (i.e. linkage) between some of the
markers, more markers may be required to achieve
the same level of accuracy of assignment. Further
simulations are required to determine the effects of
linkage on the number of SNPs required for accurate
assignment.
The size of the aquaculture industry we were able
to simulate was limited by the computing time
required in the GRAND strategy. We were unable to
simulate the full size of a large aquaculture industry
such as the Norwegian salmon aquaculture industry:
for example the number of grandparents simulated
was 180, approximately 1 / 5 of the number of
grandparents used in the industry, the number of
parents simulated was 1800, approximately 1 / 17 the
number of the number of parents of commercial
progeny used each year in the Norwegian salmon
industry. Thus, we may ask whether our results be
applied to an industry which is at least ten times larger
than the one we have simulated? Let us consider the
PAR strategy first. Bernatchez and Duchesne (2000)
derived analytical formulas to determine the number
of loci required to reach a given level of assignment
success with different numbers of parents. Their
results indicated that increasing the number of parents
from 50 to 100 required more loci to achieve 90%
assignment success, while increasing the numbers of
79
parents from 100 to 300 generally did not require
more loci to achieve 90% assignment success. The
results of Letcher and King (2001) support this
conclusion: in their study 10 microsatellite loci with
6 alleles were sufficient to assign progeny to 50, 110,
210, 310 or 410 parents with greater than 95%
accuracy. Villanueva et al. (2002) reported that 10
loci with 6 alleles were sufficient to assign progeny
from crosses among either 200 or 800 parents with
close to 100% accuracy. So our result of approximately 15 microsatellite loci or 75 SNPs to achieve
95% probability of correct assignment decisions may
also hold with much larger numbers of parents.
For the strategy of assigning market fish to
grandparents of origin (GRAND), there has been no
investigation in the literature into the effect of
increasing the number of grandparents on the accuracy of assigning the grand offspring. To investigate
this, we simulated a number of populations with
smaller numbers of grandparents than that used above
(results not shown). In general, we found changing the
number of grandparents did not greatly alter the
conclusions, i.e. that approximately 50 microsatellites
and 200 SNPs are required for accurate assignment of
fish to grandparents. Letcher and King in their
simulation study concluded that 16 loci with 18
alleles each were sufficient to correctly assign fish to
grandparents: they required fewer loci than we did,
but the number of alleles per loci in their study was
substantially higher.
So in general, we conclude that the results from our
simulations (e.g. number of loci required to achieve
N0.95 proportion of correct assignment decisions
from each strategy) should also roughly apply to
aquaculture industries using large numbers of parents
and grandparents of commercial offspring than we
have simulated.
In all strategies, the decision to assign fish sampled
from the market place to the wild population was
based on excluding the possibility that the fish was
either a full sib (FS), offspring (PAR) or grand
offspring (GRAND) of the genotyped fish. We used
an exclusion probability of one as the criteria to assign
fish to the wild. In other words, there was a zero
probability that the fish sampled from the market
place could be offspring, grand offspring or full sib of
the sampled fish. It could be argued that this criterion
was too stringent, and using a lower exclusion
80
B. Hayes et al. / Aquaculture 250 (2005) 70–81
probability would allow us to correctly assign fish to
the wild more frequently. However, further investigation showed that if a fish sampled from the market
place was of wild origin, fewer markers were required
to accurately assign it correctly than if the fish
sampled was of farmed origin. For example in strategy
FS 100 SNPs were sufficient to correctly assign wild
fish to the wild population with 100% of correct
decisions, while 175 SNPs were required to assign
farmed fish to commercial operation of origin
correctly.
4.3. Application of strategies
Given that the ultimate objective of any traceability scheme is to trace a fish from any point in the
production chain to any other point in the production
chain, it is necessary to consider how this could be
achieved with the three strategies we have evaluated
here. For the example of the Norwegian farmed
salmon industry, the logistics are complicated considerably by the fact that while a multiplier unit
obtains the genetic material from one breeding
nucleus, a commercial producer may purchase eyed
eggs from different multiplier units. Although
GRAND has the lowest genotyping cost of any
strategy, it also has the most complicated logistical
requirements. GRAND requires that all the progeny
(i.e., the parents at the multiplier units and the
offspring at commercial units) of a pair or a set of
grandparents are kept separate both at the multiplier
and commercial levels, in order for a fish sampled
from the market place to be traced to any point in the
production chain, given that the fish can be assigned
to a set of grandparents. This is probably impractical:
due to the very high fecundity in fish species this will
make up too many large and unmanageable fish
groups at the different levels of the production chain.
However, GRAND could still be used to discriminate
between fish of a different stock (as by allocating fish
to grandparents the nucleus of origin is implicitly
identified) or from different sub-populations (yearclasses) of the same nucleus.
One alternative to DNA markers for implementation of traceability systems is labelling of product with
information on stock and farm of origin, date of
harvest and so on (Håstein et al., 2001). In practise,
traceability systems based on labelling may be
cheaper and easier to implement than systems using
DNA markers. DNA markers may still have a role to
play in such systems however. Periodic verification
that the labelling system is accurately tracing the
production chain may be required, and this could be
independently achieved with DNA markers, using any
of the three strategies described here.
Although we have not explicitly tackled the issue
of escapees from fish farms, our results do have some
bearing on this problem. All three of our strategies
discriminate between fish from captive populations
and the wild population. The strategy with the most
immediate relevance though is perhaps FS. If fish
were sampled from a particular region, and DNA
taken from the sampled fish, the FS strategy could be
used in the first instance to determine if any of the fish
sampled were escapees from nearby fish farms. Only
fish from the farms in the region of interest would
then need to be genotyped.
The traceability strategies using DNA markers
outlined in this paper are based on the flow of the
marker genes from a few grandparents at the nuclei
level, to many parents at the multiplier level and
finally to a very high number of grow-out animals at
the commercial farm level. This structure is typical of
many aquaculture industries, including the farmed
Atlantic salmon industry in Norway as was used as an
example in this paper. Modifications of this structure
may exist in particular due to the special reproduction
characteristics and capacity of the actual fish species
and the size of the production output from the
industry. For example, in industries with a low output
from very highly prolific and multiple spawning
species, a sufficient number of parent fish at the
multiplier level may be recruited directly from the
nucleus. In this case, the GRAND strategy is not
valid, and either PAR (with nucleus parents genotyped) or FS must be used. The most suitable and cost
effective traceability strategy for a particular industry
will depend heavily on the organisation of that
industry, for example the degree of recording transfer
of fish, eggs and larvae between tiers.
Acknowledgements
The authors are grateful for funding from the
Norwegian research council (Project number 130162/
B. Hayes et al. / Aquaculture 250 (2005) 70–81
140). Professor Theo Meuwissen is thanked for advice
on the similarity index.
References
Bernatchez, L., Duchesne, P., 2000. Individual-based genotype
analysis in studies of parentage and population assignment:
how many loci, how many alleles? Can. J. Fish Aquat. Sci. 57,
1 – 12.
Eding, J.H., Meuwissen, T.H.E., 2001. Marker based estimates of
between and within population kinships for the conservation of
genetic diversity. J. Anim. Breed. Genet. 118, 141 – 159.
Estoup, A., Gharbi, K., SanCristobal, M., Chevalet, C., Haffray, P.,
Guyomard, R., 1998. Parentage assignment using microsatellites
in turbot (Scophthalmus maximus) and rainbow trout (Oncoryhnchus mykiss) hatchery populations. Can. J. Fish Aquat. Sci.
55, 715 – 725.
Gilbey, J., Verspoor, E., McLay, A., Houlihan, D., 2004. A
microsatellite linkage map for Atlantic salmon (Salmo salar).
Anim. Genet. 35 (2), 98 – 105.
Glaubitz, J.C., Rhodes, E., DeWoody, A., 2003. Prospects for
inferring pairwise relationships with single nucleotide polymorphisms. Mol. Ecol. 12, 1039 – 1047.
Hayes, B., L&rdahl, J., Lien, S., Berg, P., Davidson, W., Koop, B.,
Adzhubei, A., Hbyheim, B., 2004. Detection of single nucleotide polymorphisms (SNPs) from Atlantic salmon Expressed
Sequence Tags (ESTs). Proc. Euro. Assoc. Anim. Prod. Bled,
Slovenia.
He, C., Chen, L., Simmons, M., Li, P., Kim, S., Liu, Z.J., 2003.
Putative SNP discovery in interspecific hybrids of catfish by
comparative EST analysis. Anim. Genet. 34 (6), 445 – 448.
81
Håstein, T., Hill, B.J., Berthe, F., Lightner, D.V., 2001. Traceability of aquatic animals. Rev. Sci. Tech. - Off. Int. Épizoot. 20,
564 – 583.
Kocher, T.D., Lee, W., Sobolewska, H., Penman, D., McAndrew,
B., 1998. A genetic linkage map of a cichlid fish, the Tilapia
(Oreochromis niloticus). Genetics 148, 1225 – 1232.
Letcher, B.L., King, T.L., 2001. Parentage and grand parentage
assignment with known and unknown matings: application to
Connecticut River Atlantic salmon restoration. Can. J. Fish
Aquat. Sci. 58, 1812 – 1821.
Marth, G., Yeh, R., Minton, M., Donaldson, R., Li, Q., Duan, S.,
Davenport, R., Miller, R.D., Kwok, P.Y., 2001, Apr. Singlenucleotide polymorphisms in the public domain: how useful are
they? Nat. Genet. 27 (4), 371 – 372.
Sakamoto, T., Danzmann, R.G., Gharbi, K., Howard, P., Ozaki, A.,
Khoo, S.K., Woram, R.A., Okamoto, N., Ferguson, M.M.,
Holm, L.E., Guyomard, R., Hoyheim, B., 2000. A microsatellite
linkage map of rainbow trout (Oncorhynchus mykiss) characterized by large sex-specific differences in recombination rates.
Genetics 155 (3), 1331 – 1345.
SanCristobal, M., Chevalet, C., 1997. Error tolerant parent
identification from a finite set of individuals. Genet. Res. 70,
53 – 62.
Skaala, a., Hbyheim, B., Glover, K., Dahle, G., 2004. Microsatellite
analysis in domesticated and wild Atlantic salmon (Salmo salar
L.): allelic diversity and identification of individuals. Aquaculture 240, 131 – 143.
Smith, C.T., Templin, W.D., Seeb, J.E., Seeb, L.W., 2003. Nuclear
and mitochondrial SNPs provide high-throughput resolution for
migratory studies of Chinook salmon.
Villanueva, B., Verspoor, E., Visser, P.M., 2002. Parental assignment in fish using microsatellite markers with finite numbers of
parents and offspring. Anim. Genet. 33, 33 – 41.