Comparison of different imputation methods

  • J Johnston
  • G Kistemaker
  • P G Sullivan

Abstract

Genotype imputation is a powerful tool to include animals genotyped with low density panels into higher density genomic evaluation without having to genotype them with more expensive high density panels. A range of different imputation software programs is available and it is necessary to compare their performance with data that represents the population for which the programs will be used. This study selected five imputation programs (AlphaImpute, BEAGLE, FImpute, findhap, PHASEBOOK) that were feasible for the structure and size of dairy genomic data sets and compared their ability to impute genotypes from the Illumina Bovine3K BEAD chip (3k) to genotypes from the Illumina Bovine SNP50 chip (50k) using two data sets. The first data set aimed to represent data from a small dairy breed with limited number of genotyped dams. The second data set mimicked a large dairy breed with many animals genotyped and majority of parents genotyped. All five compared programs performed very well in imputing genotypes from 3k to 50k density. On average, all of them imputed more than 90% genotypes correctly. Each of them had certain strengths and weaknesses. FImpute was the fastest program and was the most accurate software program for animals with family information. BEAGLE was the most accurate software for animals with limited family information. The choice of optimal imputation software is highly dependent on structure of the genotype file, namely the proportion of animals with their ancestors genotyped. Blending results from FImpute and BEAGLE seems to be a viable solution for current dairy data sets, which contains a proportion of animals without genotyped parents.