Detecting Relationships among Genotypes in a Rapidly Growing Collection


  • George Wiggans Animal Improvement Programs Laboratory, ARS-USDA
  • Gerald Jansen Council on Dairy Cattle Breeding
  • Lillian Bacheller Council on Dairy Cattle Breeding
  • Josê Carrillo Council on Dairy Cattle Breeding


To correct pedigree errors and discover genotype misassignments, the Council on Dairy Cattle Breeding in the United States compares each new genotype with existing genotypes. With over 6 million genotypes as of May 2022, this is a computationally demanding task. The process was recently revised to maintain a table of genotype pairs that are similar enough to qualify as having a parent-progeny relationship or to be identical. Those genotype pairs are identified by a unique genotype identification and thus are unaffected by changes in genotype assignment to animal. Having those pairs substantially reduces processing time when propagating the effects of pedigree or assignment changes on the usability of genotypes. A set of 3,552 SNPs selected based on call rate and Mendelian consistency is used for the comparisons. Determination of percentage of conflicts stops after 96 and 1,000 SNPs if members of a genotype pair are unlikely to be related. The memory required to store the set of genotypes that is being searched is minimized by using just 2 bits per SNP. The time to access those genotypes is minimized by using memory mapping, which effectively makes the disk where the genotypes are stored an extension of memory. New or updated genotypes are compared with a restricted set of genotypes (one per animal) to reduce processing time. All animals with genotyped progeny are checked. Remaining genotypes are compared in birth date order so that no genotypes from animals born more than 12 years earlier are checked. This limit is reduced to 5 years if both parents of the animal are confirmed. Non-AI bulls with no progeny born in the last 5 years are skipped. Initial determination of unlikely grandsires is done using SNP-at-a-time comparisons and the genotype of the other parent (if available) based on the same 3,552 SNPs. During weekly and monthly evaluations, grandsires are validated using imputed haplotype comparisons. The reliance on the new procedure for discovery of close relatives eliminates the need to access full genotypes of all animals as was previously done. Previously, to minimize database access, all genotypes were loaded in memory from a file. Now, only those full genotypes needed to confirm pedigree relationships are retrieved from the database. The genotypes in the database are compressed, which reduces storage by 75%. These modifications allow comprehensive genotype checking while keeping processing time within acceptable limits

Author Biography

Gerald Jansen, Council on Dairy Cattle Breeding