Assessing the predictive performance of the Bagging algorithm for genomic selection

Document Type : Research Article (Regular Paper)

Author

Department of Animal Science, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran

Abstract

The aim of the present study was to compare the predictive performance of the Bagging algorithm with other decision tree-based methods, including regression tree (RT), random forest (RF) and Boosting in genomic selection. A genome including ten chromosomes for 1,000 individuals on which 10,000 single nucleotide polymorphisms (SNP) were evenly distributed was simulated. QTL effects were assigned to 10% of the polymorphic SNPs, with effects sampled from a gamma distribution. Predictive performance measures including accuracy of prediction, reliability and bias were used to compare the methods. Computing time and memory requirements of the studied methods were also measured. In all methods studied, the accuracy of genomic evaluation increased following increase in the heritability level from 0.10 to 0.50. While RT was the most efficient user of time and memory, it was not recommended for genomic selection due to its poor predictive performance. The obtained results showed that the predictive performance of Bagging was equal to RF and higher than RT and Boosting. However, it required significantly higher computational time and memory requirements. Considering the overall performance, Bagging was recommended for genomic selection, especially when due to the size and structure of the genomic data, the use of RF is limited.
 

Keywords

Main Subjects


References
Abdollahi-Arpanahi, R., Morota, G., Valente, B.D., Kranis, A., Rosa, G.J.M., Gianola, D., 2015. Assessment of bagging GBLUP for whole-genome prediction of broiler chicken traits. Journal of Animal Breeding and Genetics 132, 218-228.
Ahmadi, Z., Ghafouri-Kesbi, F., Zamani, P., 2021. Assessing the performance of a novel method for genomic selection: rrBLUP method6. Journal of Genetics 100, 24.
Ashoori-Banaei, S., Ghafouri-Kesbi, F., Ahmadi, A., 2021. Comparison of regression tree-based methods in genomic selection. Journal of Genetics 100, 85.
Budhlakoti, N., Kushwaha, A.K., Rai, A., Chaturvedi, K.K., Kumar, A., Pradhan, A.K., 2022. Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate resilient crops. Frontiers in Genetics 13, 66.
Dekkers, J.C., 2004. Commercial application of marker-and gene-assisted selection in livestock: strategies and lessons. Journal of Animal Science 82, E313-E32
de Sousa, I.C.D., Nascimento, M., Silva, G.N., Nascimento, A.C.C., Cruz, C.D., Silva, F.F., Almeida, D.P.D., Pestana, K.N., Azevedo, C.F., Zambolim, L., 2021. Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola 78, e20200021.  
Fernando, R.L., Grossman, M., 1989. Marker-assisted selection using best linear unbiased prediction. Genetic Selection Evolution 2, 246-477.
Greenwell, B., Bradley, B., Cunningham, J., 2019. gbm: Generalized Boosted Regression Models. Available at: https://cran.r-project.org/web/packages/gbm/index.html
Gianola, D., Weigel, K.A., Kramer, N., Stella, A., Schon, C.C., 2014. Enhancing genome-enabled prediction by bagging genomic BLUP. PlosOne: e91693.
González-Recio, O., Forni, S., 2011. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genetics Selection Evolution 43, 7.
Hastie, T.J., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning. 2nd ed. Springer, New York, USA.
Hayes, B.J., Daetwyler, H.D., Bowman, P., Moser, G., Tier, B., Crump, R., Khatkar, M., Raadsma H.W., Goddard M.E., 2010 Accuracy of genomic selection: comparing theory and results. In: Proceedings of the 18th Conference of the Association for the Advancement of Animal Breeding and Genetics. Barossa Valley (Australia). 27 September–2 October 2009, (Barossa Valley, Australia).
Hill, W.G., Robertson, A., 1968. Linkage disequilibrium in finite populations. Theoretical and Applied Genetics 38, 226-231.
Howard, R., Carriquiry, A.L., Beavis, W.D., 2014. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. Genetics 4, 1027-1046.
Jafarzadeh, H., Mahdianpari, M., Gill, E., Mohammadimanesh, F., Homayouni, S., 2021. Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and PolSAR data: A comparative evaluation. Remote Sensors 13, 4405.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning: with Applications in R. Springer, New York, USA.
Legarra, A., Reverter, A., 2018. Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method. Genetic Selection Evolution 50, 53.
Liaw, A., Wiener, M., 2018. Breiman and Cutler’s random forests for classification and regression. Available at: http://cran.r-project.org/web/ packages/randomForest/index.html.
Mohammadi-Chamachar, N., Hafezian, S.H., Honarvar, M., Farhadi, A., 2015. Effects of heritability and number of quantitative trait loci (QTL) on accuracy of genomic estimated breeding value. Journal of Ruminant Research 3, 111-124 (in Farsi with English Abstract).
Meuwissen, T.H.E., Hayes, B.J., Goddard, M.E., 2001. Prediction of total genetic value using genome wide dense marker maps. Genetics 157, 1819-1829.
Oguto, J.O., Piepho, H.P., Schulz-Streeck, T., 2011. A comparison of random forests, Gradient Boosting and support vector machines for genomic selection. BMC Proceedings 5, 11.
R Development Core Team., 2023. R: A language and environment for statistical computing. Vienna, Austria. Available at: https://www.R-project.org/.
Spelman, R., Garrick, D., 1997. Utilization of marker assisted selection in a commercial dairy cow population. Livestock Production Science 47, 139-147.
Sahebalam, H., Gholizadeh, M., Hafezian, H., Ebrahimi, F., 2022. Evaluation of Bagging approach versus GBLUP and Bayesian LASSO in genomic prediction. Journal of Genetics 101, 19.
Technow, F., 2013. hypred: Simulation of genomic data in applied genetics. Available at: https://github.com/cran/hypred.
Therneau, T., Atkinsonm., B., Ripley, B., 2019. rpart: Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone. Available at: https://cran.r-project.org/web/packages/rpart/index.html.
Valiati Barreto, C.A., Dias, K., de Sousa, I.C., Azevedo, C.F., Nascimento, A.C.C., Guimarães, L.G.M., Guimarães, C.T., Pastina, M.M., Nascimento, M., 2024. Genomic prediction in multi‑environment trials in maize using statistical and machine learning methods. Scientific Reports 14, 1062.
Wickham, H., 2018. pryr: Useful tools to pry back the covers of R and understand the language at a deeper level. Available at: https://cran.r-project.org/web/packages/pryr/index.html.
Zhang, A., Wang, H., Beyene, Y., Semagn, K., 2017. Effect of trait heritability, training population size and marker density on genomic prediction accuracy estimation in 22 bi-parental tropical maize populations. Frontiers in Plant Science 8, 1916.