Ensemble-based imputation for genomic selection: an application to Angus cattle

Authors

  • Chuanyu Sun
  • Xiao-Lin Wu
  • Kent A. Weigel
  • Guilherme J.M. Rosa
  • Stewart Bauck
  • Brent W. Woodward
  • Robert D. Schnabel
  • Jeremy F. Taylor
  • Daniel Gianola

Keywords:

AdaBoost, cattle, ensemble-based system, genomic selection, imputation, single nucleotide polymorphisms (SNP)

Abstract

Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can markedly reduce genotyping costs. Several imputation software packages have been developed; however, these vary in imputation accuracy and imputed genotypes may be inconsistent over methods. An AdaBoost-like approach was developed to combine imputation results from several independent software packages, i.e., Beagle (v3.3), IMPUTE (v2.0), fastPHASE (v1.4), AlphaImpute, findhap (v2), and Fimpute (v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority “voting” to determine unknown genotypes. The data included 3,078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16, and BTA28) were used to compare imputation accuracy among methods, and our application involved imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy, which ranged from 0.8677 to 0.9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of our ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to use of independent methods.

Author Biographies

Chuanyu Sun

Department of Dairy Science, University of Wisconsin, Madison

Xiao-Lin Wu

Department of Dairy Science, University of Wisconsin, Madison

Kent A. Weigel

Department of Dairy Science, University of Wisconsin, Madison

Guilherme J.M. Rosa

Department of Animal Sciences, University of Wisconsin, Madison

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison


Stewart Bauck

Merial Limited, Duluth, GA

Brent W. Woodward

Merial Limited, Duluth, GA

Robert D. Schnabel

Division of Animal Sciences, University of Missouri

Jeremy F. Taylor

Division of Animal Sciences, University of Missouri

Daniel Gianola

Department of Dairy Science, University of Wisconsin, Madison

Department of Animal Sciences, University of Wisconsin, Madison

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison

Downloads

Published

2012-05-24