Lessons learned in pooling data for reference populations


  • E Wall
  • M P Coffey
  • R F Veerkamp
  • S McParland
  • G Banos


This study set out to demonstrate the feasibility of merging data from 4 different experimental resource dairy populations (1 herd in each of Scotland and Ireland, and 2 in the Netherlands) to create a pooled reference population for joint genetic and genomic analyses. Data included a total of 60,058 weekly records from 1,630 Holstein-Friesian cows across the 4 herds and included 7 traits: milk, fat and protein yield, milk somatic cell count, live weight, dry matter intake, and energy intake and balance. Missing records were predicted using random regression models, so that at the end there were 44 weekly records, corresponding to the typical 305-day lactation, for each cow. Data were subsequently merged and analysed with mixed linear models. Genetic variance and heritability estimates were greater (P<0.05) than zero for all traits except for average milk somatic cell count in weeks 16-44. Proportion of total phenotypic variance due to genotype by environment (sire by herd) interaction was not different (P>0.05) from zero. When estimable, the genetic correlation between herds for the same trait ranged from 0.85 to 0.99. Results suggested that merging experimental herd data into a single dataset is both feasible and sensible, despite potential differences in management and recording of the animals in the four herds. Merging experimental data will increase the precision of parameter estimates in a genetic analysis and augment the potential reference population in genome-wide association studies especially of difficult-to-record traits.