Technical options for all-breed Single-step GBLUP for US dairy cattle

Authors

  • Andres Legarra
  • Matias Bermann
  • Paul M VanRaden
  • Ezequiel Luis Nicolazzi
  • Rodrigo R Mota
  • Joe Menwer Tabet
  • Daniela L Lourenco
  • Ignacy Misztal

Abstract

The multi-step method for genomic prediction has worked remarkably well for US dairy cattle, but intense genomic selection makes recent genetic trends difficult to estimate in pedigree-only based BLUP evaluations. Thus, the introduction of routine single-step GBLUP (ssGBLUP) is under study. The large size of US dairy cattle data precludes naïve approaches for genomic prediction. Here we present the technical choices and needs of an all-breed (6 breeds and all existing crosses), ssGBLUP applied to different sets of traits within trait groups such as fertility, livability and health data. For each trait group, first, we prune pedigree to animals with records and their ancestors, reducing the size of pedigree and improving memory use and convergence. The model includes only genotypes of animals in this pruned pedigree, and we predict the other animals later either using Parent Average (if not genotyped) or sum of SNP effects (if genotyped). The set of markers is the usual CDCB set with 78,964 markers and included autosomes and sex chromosomes. The method for ssGBLUP was G-matrix with Algorithm for Proven and Young (APY) with metafounders (MF). APY largely reduces computational needs whereas MF provides smooth solutions for unknown origins and automatic compatibility of pedigree and genomic relationships within and across breeds. The gamma matrix was constructed based on base allele frequencies across breeds and increases of inbreeding within breeds.  Core animals were chosen within breed, in a heuristic but complete and repeatable manner: genotyped sires with more than a certain number of daughters in records, and a deterministic subset of genotyped cows with records. This resulted in ~45K animals in the core and ~2M non-core animals for fertility evaluations. Still memory needs are large as G_APY inverse, stored in double precision, takes ~720 Gb. Thus, we used memory mapping (mmap) to assign memory to disk space. For the case of fertility (4 traits), computation of G-1_APY took 28h and 100 Gb of RAM using mmap. Solving MME took 22h, 120 Gb of RAM and 476 rounds of PCG. Genomic reliabilities took 120 Gb of RAM and 8h per trait. Backsolving for SNP solutions took negligible time and memory. Owing to the developments reported here, computations for ssGBLUP in this very large database can be done with reasonable time and memory.

Downloads

Published

2024-09-04