Part 2 - Guernsey NZ

Lessons learnt in implementation of genomic selection in breeding dairy cattle
for New Zealand. What are the implications for numerically smaller breeds such
as Guernsey?
Bevin Harris, Science Leader, LIC, Hamilton New Zealand
Introduction
LIC has been investing in DNA technology since the early 1990s. In the mid 1990s, the first
detection of quantitative trait loci (QTL) for dairy cattle had begun (Georges et al., 1995).
In New Zealand (NZ) the first application of DNA information in the LIC breeding scheme
was for parentage testing in the mid 1990s using microsatellite markers costing
approximately $3 (NZ) per marker to analyse. LIC invested in a QTL discovery program
that yielded a small number of QTL for the milk production traits. These QTL were used
via marker assisted selection (MAS) in 1998 and 1999 within the LIC breeding programme
until it was determined that the cost of the undertaking MAS was greater than its
economic return (Spelman, 2002).
Application of genomic information in dairy cattle breeding schemes moved from the
MAS to the use of high density marker panels within the breeding scheme, which is
commonly referred to as genomic selection (Meuwissen et al. (2001)). Genomic selection
has become a key component of most dairy cattle breeding schemes over the past 8 years
(Pryce and Daetwyler (2012)), including LICs breeding scheme, with predominantly young
sires being routinely evaluated and selected on the basis of the combination of their
genomic profile and pedigree information. Genomic breeding values are now widely used
for the selection of young dairy sires. In most countries, the genomic predictions are from
within-breed analyses using a single-breed reference population.
Unlike other countries, NZ has purebred populations, as well as a large crossbred
population. Crossbreeding in NZ dairy industry has been steadily increasing since the early
1980s. The crosses are mainly between the Holstein Friesian and Jersey breeds. In the
2014/15 season, the proportion of crossbreed heifers entering the herd was 54%,
compared to 35% and 10% for Holstein Friesians and Jerseys, respectively. In 2001
progeny tested crossbred sires became available to the NZ industry. The recorded NZ
dairy cattle population have been evaluated using a multi-breed animal model since 1996
allowing direct breeding value comparison of all animals regardless of breed or breed mix.
The incorporation of genomic data provides additional challenges in a multi-breed
analysis. Genomic relationships are a function of allele frequencies, which may differ
among breeds because of different origins and selection pressures.
Since the introduction of genomics in 2008 there has been a marked increase in the
number of animals genotyped and the scale genomics resource that is being used to
improve genomic selection. This paper outline the experiences, challenges and progress
made in the LIC genomic selection programme from 2008 to the present. Also discussed
will be challenges of implementing a genomic selection programmes in small dairy cattle
populations.
The LIC Genomic Selection 2008 through to 2016
The initial implementation of genomic selection
Meuwissen et al. (2001) first proposed the theory of genomic selection. However, it was
not until 2007, after the sequencing of the bovine genome had been completed (Kappes
et al. 2006) and the Illumina 50K Bovine SNP chip was released that genomic selection
became a reality for commercial breeding schemes. In NZ, in 2007, the cost of genotyping
dropped from $3 (NZ) per marker for a microsatellite to less than 1 cent per marker when
tens of thousands of SNP are genotyped in parallel. Genomic selection now became
technically and financially feasible.
In 2007, LIC genotyped approximately 2,400 Holstein Friesian, 1,500 Jersey and 650
crossbred sires on the Illumina 50k Bovine SNP chip. These were essentially all the
historical sires used in the LIC breeding scheme that had DNA available for genotyping
and all sires that were in the current breeding scheme as at 2007. The initial genomic
selection research validated the genomic breeding values on 3 years of progeny test sires
(not included in the training population). The correlations between the genomic breeding
values and the breeding values from daughter information ranged from 0.50 to 0.72. It
was also determined that using the Holstein-Friesian sires to predict Jersey genomic
breeding values and vice-versa produced much poorer results than using all breeds and
crossbreds in one genomic selection model. At this time the NZ reference population was
approximately 3500 sires which was considered large at this time.
In the 2008 season teams of genomically selected sires were made available to NZ farmers
alongside the teams of progeny test sires. The breeding scheme size was dropped from
300 bulls per annual intake to 160 bulls. The reduction in size of progeny test scheme
from 300 sire to 160 was due to the ability to screen a large number of young bulls based
on their genomic breeding values. The team of genomically selected sires included both
yearling and 2 year old young bulls. The sire analysts aggressively used the genomic young
sires as sires of sons for the future progeny test intakes.
The initial daughter results from genomic selection
By 2010-2011 the teams of genomically selected sires were getting breeding values based
on their daughter information. The daughter breeding values showed that the initial
genomic breeding values had been biased upwards and the accuracies lower than
predicted. Also, it was shown that the parent average breeding value was also biased
upwards which further compounded the genomic breeding value bias. The genomic
breeding value inflation (inverse of the bias) values were around 0.7 (or 30% to high). The
lower accuracies were in part due to over-fitting of the SNP markers by the genomic
selection model. Over-fitting occurs when you are trying to estimate a large number of
factors from a small amount of data. The genomic breeding value model was attempting
to estimate approximately 44,000 factors from 3500 sires. The performance of the
reduced progeny test scheme (160 sires) depended on the the ability to screen a large
number of young bulls based on their genomic breeding values. The 160 bulls are selected
from approximately 2000 bull calves with genomic information. The initial pre-selection
was based on a custom-made 384 SNP panel for in the first year and a 50K panel for the
next four years. The pre-selection process is only as good as the accuracy of the genomic
selection model. A number of initiatives were undertaken to improve the accuracy of the
genomic selection model.
1. Genotyping of 3000 animals on the Illumina 777K panel. It was considered that
having a higher density of SNP markers would improve the accuracy of genomic
selection particularly for crossbreed populations since there would be a greater
probability that at least 1 or more SNP marker would be associated with QTL in the
different breeds compared to the 50k panel.
2. The size of training population was increased. Two approaches were undertaken,
genotype swapping with Australia, Ireland and CRV Ambreed and genotyping
progeny test cows. By 2011, approximately 14,000 cows had been genotyped.
3. Statistical methods had been developed to control the biases, however, these
methods do not change the accuracy of genomic selection.
The use of Illumina 777K panel with in the genomic selection model produced similar
results to the 50K panel (Harris et. al., 2011). The major problem was that the training
population sizes used in these studies had insufficient statistical power to exploit the
increase in marker density.
Inclusion of sire genotypes from Australia and Ireland produced no increases in accuracy
in NZ. Most of these sires had little or no phenotype information in NZ. Their Interbull
breeding values had to be used as a proxy NZ phenotype. Any increases in accuracy from
increased training population size were nullified by the non-unity between country
genetic correlations reducing the accuracy of the sires phenotype. However, the inclusion
of genotypes from CRV Ambreed which had NZ phenotypes improved the accuracy by
approximately 3% (Spelman et., al. 2012).
The inclusion of cow genotypes resulted in improvements in accuracy and reduction in
bias (Harris et. al., 2013). The accuracies from the genomic selection model validations
improved by 5% and the inflation of the genomic breeding values were reduced by 2030%.
Increasing the accuracy of genomic selection by sequencing
(towards the future) and new statistical models
In 2012, LIC undertook to whole-genome sequencing of 600 Holstein- Friesian, Jersey and
crossbred dairy animals with the objective of increasing the rate of genetic improvement
through increasing accuracy of genomic prediction. At the same time increasing numbers
of females were genotyped. By 2016, approximately 120,000 cows had been genotyped.
An output of the whole-genome sequence was a pool of 18 million SNP variants. The SNP
variants can be imputed into the existing 120,000 genotyped animals. The imputed
genotypes can be then used in genome wide association mapping to identify causal
variants or markers in strong association with the causal variants. The aim is use the
selected markers to increase the accuracy of genomic selection.
In addition, RNA-sequencing of mammary tissue for 350 lactating cows has been
undertaken to augment identification of causal variants. This allows the identification of
SNP variants that are located in genes which are reasonable for lactation. These variants
can augment identification of causal variants within in our NZ populations. Meuwissen
and Goddard (2010) reported from simulations that the increase in marker density and
also the inclusion of causative mutations from the whole genome sequence can further
increase the accuracy of genomic selection.
Another advantage of the whole genome sequencing data is ability to identify recessive
deleterious genes. To date three recessive deleterious variants in the New Zealand dairy
cattle population have discovered. The discovery of these variants has enabled LIC to
include the information in software to reduce the frequency of carrier-to-carrier mating
and thus reduce the frequency of affected offspring.
In the 2012/13 season a yearling genomic sire whose sire had a de novo mutation in the
prolactin receptor (Littlejohn et. al., 2014) was used in the genomic team. The mutation
resulted in changes to the animal coat type, heat tolerance and the ability to lactate. The
mutation was dominant. This means the traits were exhibit if the offspring inherited a
single copy of the mutation. The causative mutation was identified quickly from genome
wide association mapping of affected and unaffected individuals using the whole genome
sequence data set. A genetic test was built to identify affected individuals in farmers
herds. The consequence of the mutation was that LIC decided not use yearling sires for
wide spread use in the genomic teams from 2013.
A considerable worldwide effort has been made in the development of new statistical
models for genomic evaluation. These include the single step approach (Mizstal et al.,
(2014)) and marker model approaches (Liu et. al., (2015), and Fernando et. al., (2015)).
Such approaches are becoming computationally feasible for large dairy cattle
populations. These methods should offer increased genomic breeding value accuracy by
removing statistical errors occurring from approximations that are required in the current
statistical models.
The current performance of genomic selection in NZ where 102,000 genotyped animals
are evaluated using a hybrid single step model (Winkelman et., al. 2015) has improved
considerably compared to the 2008 version of genomic selection. The validation
correlations range now from 0.60 to 0.85 and inflation values range from 0.9 to 1.05.
Challenges for small populations
The NZ genomic selection experience has provided insight to the key factors that
contribute to successfully genomic selection programme.
1. There is no substitute for training population size, the bigger the training
population the better the genomic selection accuracy. The ideal training
population will be determined by the number of chromosome segments
segregating in the population. The ideal size of the training population will be
larger for multi-breed data sets and populations that have large effective
population sizes than single breed populations with a small effective population
size such as the Holstein population.
2. Cows are an important resource for genotyping. They can contribute to the training
population size. Our experience suggests that 6-8 cows provide a similar
improvement in genomic accuracy to 1 progeny tested sire.
3. Across country genomic evaluation is an important tool for small populations. The
Brown Swiss Intergenomics project run by Interbull centre in Sweden has been
very successful genomic selection implementation. Provided common sires are
used across the participating countries and the between country genetic
correlations are close to unity then an across country genomic evaluation will
provide increased accuracy for small populations.
4. The cost of sequencing is decreasing rapidly. Sequencing key sires could a useful
activity. The sequence could be made available to 1000 bull genomics project. This
would allow the smaller populations to leverage the data and tools provided by
the 1000 bull genomics project to help analyse their sequence data. The outcomes
could be finding SNP markers that provide higher genomic prediction accuracies
than the use of standard Bovine SNP panels and the ability to detect and manage
deleterious genes/variants.
Conclusions
The introduction of genomic information into LIC dairy cattle breeding scheme has been
a steep learning curve over the last nine years. Initially, Dairy farmers that utilised the
new technology did not benefit to the degree that was expected, which is not an
uncommon situation with new technology. Increased investment by LIC was required to
improve the accuracy of genomic selection in NZ. Further investment in sequencing and
the implementation of new statistical methods is expected to continue to improve the
accuracy of genomics and it is expected that future breeding schemes will utilise
increased levels of genomic information at the expense of progeny testing.
In the genomics era small populations have an even greater difficultly competing with
large populations such as Holstein. The key is to have the largest training population
possible. This may require genotyping bulls and cows, sharing genotypes and phenotypes
across countries. Providing sequence data to the 1000 bull genomics project could
provide benefits by pooling sequence data across multiple breeds and allow small
populations to be part of large across breed genome wide association studies.
References
Fernando R. L, Dekkers J. CM and Garrick D. J. (2014) GSE:46
Harris, B. L., Creagh, F., Winkelman A. M., et al., (2011). Interbull Bull 44
Harris, B. L., Winkelman A. M. and Johnson, D. L. (2013) Interbull Bull 46
Georges, M., Nielsen, D., Mackinnon, M., et al. (1995). Genetics 139
Littlejohn, M. D., K M. Henty, K Tiplady, T. Johnson et al. (2014) Nat. Comm.
Z. Liu, M. E. Goddard, F. Reinhardt, and R. Reents (2014) J. Dairy Sci.
Misztal, I., Legarra, A., Aguilar, I. (2014). J. Dairy Sci.
Spelman, R. J., (2002). In Proc. 7th WCGALP
Spelman, R. J, M.D. Keehan, V. Obolonkin, A.M. Winkelman, D.L. Johnson and Bevin Harris
(2012) Interbull Bull 45
Kappes, S.M.; Green, R.D. and van Tassell, C.P. 2006. In Proc. 8th WCGALP.
Meuwissen TH, Hayes BJ, Goddard ME. (2001). Genetics. 157(4)
Meuwissen T., Goddard M. (2010). Genetics. 185(2)
Pryce, J. E. and Daetwyler, H. D. (2012). Animal Production Science, Vol. 52 No. 3