Genomic evaluation using machine learning algorithms in the Spanish Holstein population

Authors

  • Jose Antonio Jiménez Montero UPM
  • Oscar González-Recio INIA
  • Rafael Alenda UPM
  • Juan Pena CONAFE

Keywords:

genome-assisted evaluation, machine learning, predictive ability, model comparison

Abstract

The aim of this study was to validate the recorded data and genome-assisted evaluation models for the Spanish Holstein population as an initial step towards the first official national genomic evaluation.

Preliminary national genomic evaluation for production and type traits in Holstein Friesian bulls in Spain were tested using both the Spanish reference population (ESP), composed by 2,115 progeny tested bulls, and the Eurogenomics population (EG), composed by 22,247 progeny tested bulls. Four different traits currently included in the Spanish genetic evaluation were used: milk yield (MY), fat yield (FY), protein yield (PY), and udder depth (UD).

Two different genomic evaluation methodologies, Bayesian-Lasso (B-Lasso) and a machine learning algorithm: Random-Boosting (R-Boost) were compared to traditional pedigree index (PI).

The predictive ability was measured in terms of correlations, mean square error (MSE) and regression coefficients between progeny proofs and direct genomic values (DGV) in the validation set. Genomic evaluations were more accurate than the traditional pedigree index. The increment in Pearson correlation between observed and predicted response depended on the trait, but the EG population provided greater accuracy than ESP at predicting future progeny performance, as expected.

The methodologies implemented showed similar results. B-Lasso showed higher Pearson correlations for MY (0.590 vs 0.572), FY (0.655vs 0.649) and PY (0.583vs 0.545), whereas R-Boost showed larger values for UD (0.584 vs 0.562).

Genomic predictions from R-Boost resulted in 4.03% lower predictive mean square errors than B-Lasso. R-Boost showed smaller MSE for MY, PY and UD, whereas B-Lasso was preferred for FY in terms of MSE.

R-Boost showed regression coefficients more close to 1 than B-Lasso.

The response to different methodologies of genomic evaluation was within the range of values expected for a population of a similar size. The methods that presented higher Pearson correlation also showed larger MSE. This should be considered in model comparison study deciding the method with better predictive ability.

Author Biographies

Jose Antonio Jiménez Montero, UPM

Animal Production Department

Oscar González-Recio, INIA

Animal Breeding Department

Rafael Alenda, UPM

Animal Production Department

Juan Pena, CONAFE

Technical Department

Downloads

Published

2012-05-25