Long-range phasing and use of crossbred data in genomic

Long-range phasing and use of crossbred data in genomic selection
B.P. Kinghorn, J.M. Hickey and J.H.J. van der Werf
School of Environmental and Rural Science, University of New England, Australia.
[email protected]
Phasing the genotypes of individuals can lead to identification of the sperm and egg
contributions to these genotypes. Working with crossbred phenotypes and genotypes, phasing
gives us the basis to determine the differences between sperm and eggs in what constitutes
favourable genetic contributions: The best set of alleles for the sperm to carry is generally not
the best set for the egg to carry, and the difference relates to the expression of heterosis.
As is stands, this statement relies on the presence of at least some overdominance, such that
the best genotype for a crossbred individual includes at least some heterozygotes. However,
as illustrated later, a policy of separately evaluating sperm and egg contributions under
genomic selection can lead to notable benefits even when there is no overdominance.
The concept of long range phasing (LRP) is based on diagnosis of haplotype sharing between
individuals that can be distantly related. This diagnosis is based on occurrence of no
opposing homozygotes between the two individuals over a sufficiently long region of
genome. For each individual to be phased, we find many such partners, and split these into
those on the paternal and maternal sides of the individual’s pedigree, or into two exclusive
groups where pedigree is not known. This leads to diagnosis of direction of inheritance for
heterozygotes in the individual, and thus phasing, and inference of haplotypes and identity by
descent (IBD). This exercise is simpler for crossbred individuals, because each side of the
pedigree is already determined by breed. A few hundred individuals genotyped in each
parental breed should give a strong basis for phasing crossbred individuals. Given genomewide phasing, we can calibrate crossbred phenotypes separately to the sperm and egg alleles.
This gives a separate genomic prediction equation for each parental breed, to be used within
each of these parental breeds. Alternatively, use of crossbred phenotypes and crossbred
gametic information (scored 0, 1) simultaneously with genotypes for the prevailing purebred
(scored 0,1,2) in a mixed model framework can be used to implement a single-step genomic
evaluation.
The resulting reciprocal recurrent genomic selection (RRGS) selects within the pure breeds
but targets the total genetic merit in crossbreds, including both heterosis and performance in
the crossbred production environment. This gives a major reduction in generation interval
compared to classic reciprocal recurrent selection where each individual is evaluated through
performance of its crossbred progeny.
Kinghorn et al. (2010) showed considerable gains in crossbred response to RRGS compared
to within line genomic selection (WLGS), with reduction in purebred performance under
RRGS. These analyses used ideal conditions for genomic selection (full accuracy) and
needed to use high levels of dominance to generate typical heterosis (10.8%) for the number
of QTL simulated (2000 QTL ≈ 566 QTL of equal effect). The current study uses different
numbers of QTL affecting the trait of interest, different levels of dominance, and a larger
effective population size to reduce impact of inbreeding on increased heterosis.
As noted by Kinghorn et al. (2010), RRGS results in considerably increased heterosis,
reflected in higher crossbred means but also lower purebred means. The current works shows
that differences are stronger with more dominance and with more QTL affecting the trait. It is
notable that there are considerable effects of RRGS over WLGS for large numbers of QTL,
even when there is no overdominance. This is essentially due to the effect of drift. Under
7th European Symposium on Poultry Genetics 5th‐7th October 2011 (Peebles Hydro) WLGS, heterosis tends to increase due to drift, which leads to increased homozygosity in pure
lines and maintained heterozygosity in crosses. (This is equivalent to recovery of merit lost
due to inbreeding depression.) Under RRGS, even incomplete dominance is exploited to
‘influence’ this drift. Selection adds to the drift effect by tending to increase frequencies of
favourable alleles, and even partial dominance has an influence towards divergence of allele
frequencies in the parental lines (ie. more likely to ‘drift’ apart than under WLGS). Drift
plays a much bigger role when the number of QTL is high. For example, after 20 generations
of selection, mean favourable allele frequency increases from 0.5 to 0.68 for 100 QTL, but
from 0.5 to 0.51 for 10,000 QTL (Table 3). It is under the latter circumstance that partial
dominance can play a stronger role to affect changes in allele frequencies, and this is reflected
in the much higher heterosis for RRGS / WLGS: 55.5% / 8.64% for 10,000 QTL compared to
4.2% / 1.4% for 100 QTL (Table 1).
It is concluded that RRGS could be an attractive strategy where genomic EBVs can be
accurately evaluated, and where either the number of QTL is high or overdominance is
important. An interim test of RRGS does not require the driving of a breeding program.
Given availability of purebred genotypes (for LRP) and crossbred phenotypes and genotypes,
it is possible to set up and compare genomic prediction equations (or genomic relationship
matrices) for both WLGS and RRGS. The two equations for RRGS (one calibrated on sperm,
the other on eggs) are expected to differ due to random sampling effects and systematic
effects of QTL dominance, as influenced by differences in allele frequency between the pure
lines. If these two types of effect can be handled appropriately, inference can be made about
the likely value of RRGS, without having to invoke RRGS.
KINGHORN, B.P., HICKEY, J.M. and VAN DER WERF, J.H.J. (2010) Reciprocal
recurrent genomic selection for total genetic merit in crossbred individuals. Proceedings of
the 9th World Congress on Genetics Applied to Livestock Production. Paper 0036.
Page 2 7th European Symposium on Poultry Genetics 5th‐7th October 2011 (Peebles Hydro) Simulation result tables: 80 sires and 160 dams per breed and crossbreed. Dominance
deviation for each locus sampled uniformly between 0 and a, where 2a is the difference
between homozygotes for that locus. Genetic values scaled to give h²=0.25. Other conditions
as for Kinghorn et al. (2010).
Table 1. Selection responses.
Number
of QTL
10
Effective
number of
QTL
4.9
100
29.6
1000
293.9
10000
2776.4
Selection
type
RRGS
WLGS
RRGS
WLGS
RRGS
WLGS
RRGS
WLGS
Purebred
mean
11.37
11.41
12.19
12.29
13.73
13.94
17.70
18.58
Generation 5
Crossbred
Heterosis
mean
Percent
11.51
1.19
11.49
0.78
12.53
2.82
12.46
1.35
14.67
6.85
14.52
4.22
20.77
17.33
20.19
8.67
Purebred
mean
11.54
11.54
13.2
13.54
14.73
16.07
15.65
20.52
Generation 20
Crossbred Heterosis
mean
Percent
11.6
0.59
11.6
0.55
13.76
4.21
13.72
1.36
17.32
17.58
16.87
4.93
24.31
55.45
22.29
8.64
Table 2. Generation 5 allele frequencies and heterozygosity proportions.
Number
Selection Purebred mean Crossbred mean Purebred mean
of QTL
type
favourable
favourable
heterozygosity
allele frequency allele frequency
10
RRGS
0.77
0.77
0.07
WLGS
0.78
0.78
0.08
100
RRGS
0.56
0.56
0.14
WLGS
0.55
0.55
0.15
1000
RRGS
0.52
0.52
0.15
WLGS
0.52
0.52
0.16
10000
RRGS
0.50
0.50
0.15
WLGS
0.51
0.51
0.17
Table 3. Generation 20 allele frequencies and heterozygosity proportions.
Number Selection
Purebred
Crossbred mean Purebred mean
of QTL
type
mean
favourable
heterozygosity
favourable
allele frequency
allele
frequency
10
RRGS
0.95
0.96
0
WLGS
0.96
0.96
0
100
RRGS
0.67
0.67
0.05
WLGS
0.68
0.68
0.08
1000
RRGS
0.55
0.55
0.07
WLGS
0.55
0.55
0.13
10000
RRGS
0.51
0.51
0.06
WLGS
0.51
0.51
0.16
Page 3 Crossbred mean
heterozygosity
0.16
0.15
0.21
0.20
0.21
0.21
0.22
0.21
Crossbred mean
heterozygosity
0.05
0.04
0.19
0.16
0.23
0.20
0.25
0.20