Technical options for all-breed Single-step GBLUP for US dairy cattle
Abstract
The multi-step method for genomic prediction has worked remarkably well for US dairy cattle, but intense genomic selection makes recent genetic trends difficult to estimate in pedigree-only based BLUP evaluations. Thus, the introduction of routine single-step GBLUP (ssGBLUP) is under study. The large size of US dairy cattle data precludes naïve approaches for genomic prediction. Here we present the technical choices and needs of an all-breed (6 breeds and all existing crosses), ssGBLUP applied to different sets of traits within trait groups such as fertility, livability and health data. For each trait group, first, we prune pedigree to animals with records and their ancestors, reducing the size of pedigree and improving memory use and convergence. The model includes only genotypes of animals in this pruned pedigree, and we predict the other animals later either using Parent Average (if not genotyped) or sum of SNP effects (if genotyped). The set of markers is the usual CDCB set with 78,964 markers and included autosomes and sex chromosomes. The method for ssGBLUP was G-matrix with Algorithm for Proven and Young (APY) with metafounders (MF). APY largely reduces computational needs whereas MF provides smooth solutions for unknown origins and automatic compatibility of pedigree and genomic relationships within and across breeds. The gamma matrix was constructed based on base allele frequencies across breeds and increases of inbreeding within breeds. Core animals were chosen within breed, in a heuristic but complete and repeatable manner: genotyped sires with more than a certain number of daughters in records, and a deterministic subset of genotyped cows with records. This resulted in ~45K animals in the core and ~2M non-core animals for fertility evaluations. Still memory needs are large as G_APY inverse, stored in double precision, takes ~720 Gb. Thus, we used memory mapping (mmap) to assign memory to disk space. For the case of fertility (4 traits), computation of G-1_APY took 28h and 100 Gb of RAM using mmap. Solving MME took 22h, 120 Gb of RAM and 476 rounds of PCG. Genomic reliabilities took 120 Gb of RAM and 8h per trait. Backsolving for SNP solutions took negligible time and memory. Owing to the developments reported here, computations for ssGBLUP in this very large database can be done with reasonable time and memory.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).